A significant number of hotel bookings are called off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. Star Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
# this will help in making the Python code more structured automatically (good coding practice)
%load_ext nb_black
# Library to suppress warnings or deprecation notes
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter("ignore", ConvergenceWarning)
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# Library to split data
from sklearn.model_selection import train_test_split
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# To build model for prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.linear_model import LogisticRegression
# Libraries to build decision tree classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# To tune different models
from sklearn.model_selection import GridSearchCV
# To perform statistical analysis
import scipy.stats as stats
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
roc_auc_score,
plot_confusion_matrix,
precision_recall_curve,
roc_curve,
make_scorer,
)
# read the data
data = pd.read_csv("StarHotelsGroup.csv")
# make a copy of the dataset so that we do not
df = data.copy()
# view a random sample of the dataset
df.sample(n=10)
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 24011 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 17 | 2018 | 12 | 8 | Online | 0 | 0 | 0 | 91.38 | 0 | Not_Canceled |
| 30286 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 305 | 2018 | 11 | 4 | Offline | 0 | 0 | 0 | 89.00 | 0 | Canceled |
| 12581 | 3 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 246 | 2019 | 3 | 16 | Online | 0 | 0 | 0 | 122.40 | 0 | Canceled |
| 54070 | 2 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 4 | 4 | 2018 | 5 | 31 | Online | 0 | 0 | 0 | 132.08 | 1 | Not_Canceled |
| 30459 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 213 | 2018 | 6 | 7 | Offline | 0 | 0 | 0 | 130.00 | 0 | Canceled |
| 26338 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 305 | 2018 | 11 | 4 | Offline | 0 | 0 | 0 | 89.00 | 0 | Canceled |
| 46616 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 182 | 2018 | 10 | 12 | Online | 0 | 0 | 0 | 98.10 | 0 | Canceled |
| 22037 | 3 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 4 | 133 | 2018 | 10 | 10 | Online | 0 | 0 | 0 | 151.20 | 3 | Not_Canceled |
| 28982 | 1 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 67 | 2018 | 5 | 31 | Online | 0 | 0 | 0 | 140.40 | 1 | Not_Canceled |
| 50005 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 22 | 2019 | 2 | 17 | Online | 0 | 0 | 0 | 118.00 | 0 | Canceled |
# view shape of dataset
df.shape
(56926, 18)
# view info of dataset
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 56926 entries, 0 to 56925 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 56926 non-null int64 1 no_of_children 56926 non-null int64 2 no_of_weekend_nights 56926 non-null int64 3 no_of_week_nights 56926 non-null int64 4 type_of_meal_plan 56926 non-null object 5 required_car_parking_space 56926 non-null int64 6 room_type_reserved 56926 non-null object 7 lead_time 56926 non-null int64 8 arrival_year 56926 non-null int64 9 arrival_month 56926 non-null int64 10 arrival_date 56926 non-null int64 11 market_segment_type 56926 non-null object 12 repeated_guest 56926 non-null int64 13 no_of_previous_cancellations 56926 non-null int64 14 no_of_previous_bookings_not_canceled 56926 non-null int64 15 avg_price_per_room 56926 non-null float64 16 no_of_special_requests 56926 non-null int64 17 booking_status 56926 non-null object dtypes: float64(1), int64(13), object(4) memory usage: 7.8+ MB
# double check for null values
df.isnull().sum()
no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
# check to make sure all rows are valid/meaningful
# create dataframe with only bookings of no nights
df_no_nights = df[(df["no_of_weekend_nights"] == 0) & (df["no_of_week_nights"] == 0)]
df_no_nights.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 324 | 1 | 0 | 0 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2018 | 2 | 27 | Complementary | 0 | 0 | 0 | 0.0 | 1 | Not_Canceled |
| 399 | 1 | 0 | 0 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 23 | 2019 | 1 | 11 | Online | 0 | 0 | 0 | 0.0 | 2 | Not_Canceled |
| 1795 | 2 | 0 | 0 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 145 | 2018 | 7 | 5 | Online | 0 | 0 | 0 | 0.0 | 1 | Not_Canceled |
| 2159 | 3 | 0 | 0 | 0 | Meal Plan 1 | 0 | Room_Type 4 | 57 | 2018 | 4 | 1 | Online | 0 | 0 | 0 | 0.0 | 2 | Not_Canceled |
| 2971 | 2 | 0 | 0 | 0 | Meal Plan 2 | 0 | Room_Type 1 | 247 | 2018 | 6 | 6 | Online | 0 | 0 | 0 | 0.0 | 1 | Not_Canceled |
df_no_nights.shape
(102, 18)
# exclude bookings with no nights from the dataframe
df = df[(df["no_of_weekend_nights"] > 0) | (df["no_of_week_nights"] > 0)]
df.shape
(56824, 18)
# check for duplicate rows
df.duplicated().sum()
14347
# drop duplicate rows
df.drop_duplicates(inplace=True)
# check shape of dataset to make sure duplicate rows have been removed
df.shape
(42477, 18)
# check descriptive statistics of the data
df.describe(include="all").T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| no_of_adults | 42477.0 | NaN | NaN | NaN | 1.917367 | 0.527383 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 |
| no_of_children | 42477.0 | NaN | NaN | NaN | 0.142242 | 0.460068 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 |
| no_of_weekend_nights | 42477.0 | NaN | NaN | NaN | 0.897356 | 0.887845 | 0.0 | 0.0 | 1.0 | 2.0 | 8.0 |
| no_of_week_nights | 42477.0 | NaN | NaN | NaN | 2.326577 | 1.516955 | 0.0 | 1.0 | 2.0 | 3.0 | 17.0 |
| type_of_meal_plan | 42477 | 4 | Meal Plan 1 | 31792 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| required_car_parking_space | 42477.0 | NaN | NaN | NaN | 0.034442 | 0.182364 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| room_type_reserved | 42477 | 7 | Room_Type 1 | 29652 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| lead_time | 42477.0 | NaN | NaN | NaN | 77.397627 | 77.280895 | 0.0 | 16.0 | 53.0 | 118.0 | 521.0 |
| arrival_year | 42477.0 | NaN | NaN | NaN | 2018.298538 | 0.625848 | 2017.0 | 2018.0 | 2018.0 | 2019.0 | 2019.0 |
| arrival_month | 42477.0 | NaN | NaN | NaN | 6.365421 | 3.050004 | 1.0 | 4.0 | 6.0 | 9.0 | 12.0 |
| arrival_date | 42477.0 | NaN | NaN | NaN | 15.684535 | 8.814081 | 1.0 | 8.0 | 16.0 | 23.0 | 31.0 |
| market_segment_type | 42477 | 5 | Online | 34086 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| repeated_guest | 42477.0 | NaN | NaN | NaN | 0.03077 | 0.172695 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| no_of_previous_cancellations | 42477.0 | NaN | NaN | NaN | 0.025426 | 0.358547 | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 |
| no_of_previous_bookings_not_canceled | 42477.0 | NaN | NaN | NaN | 0.222921 | 2.244717 | 0.0 | 0.0 | 0.0 | 0.0 | 72.0 |
| avg_price_per_room | 42477.0 | NaN | NaN | NaN | 112.637711 | 40.551351 | 0.0 | 85.5 | 107.03 | 135.0 | 540.0 |
| no_of_special_requests | 42477.0 | NaN | NaN | NaN | 0.768369 | 0.837487 | 0.0 | 0.0 | 1.0 | 1.0 | 5.0 |
| booking_status | 42477 | 2 | Not_Canceled | 27992 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
# view value counts of categorical/discrete columns
cat_col = [
"type_of_meal_plan",
"required_car_parking_space",
"room_type_reserved",
"arrival_year",
"arrival_month",
"market_segment_type",
"repeated_guest",
"no_of_special_requests",
"booking_status",
]
for i in cat_col:
print(df[i].value_counts())
print("*" * 50)
Meal Plan 1 31792 Not Selected 8694 Meal Plan 2 1983 Meal Plan 3 8 Name: type_of_meal_plan, dtype: int64 ************************************************** 0 41014 1 1463 Name: required_car_parking_space, dtype: int64 ************************************************** Room_Type 1 29652 Room_Type 4 9357 Room_Type 6 1537 Room_Type 5 904 Room_Type 2 715 Room_Type 7 306 Room_Type 3 6 Name: room_type_reserved, dtype: int64 ************************************************** 2018 22054 2019 16552 2017 3871 Name: arrival_year, dtype: int64 ************************************************** 8 5308 7 4722 5 4343 4 4221 6 4064 3 4037 10 3194 9 3049 2 2875 12 2377 11 2183 1 2104 Name: arrival_month, dtype: int64 ************************************************** Online 34086 Offline 5777 Corporate 1939 Complementary 480 Aviation 195 Name: market_segment_type, dtype: int64 ************************************************** 0 41170 1 1307 Name: repeated_guest, dtype: int64 ************************************************** 0 19180 1 15533 2 6369 3 1229 4 150 5 16 Name: no_of_special_requests, dtype: int64 ************************************************** Not_Canceled 27992 Canceled 14485 Name: booking_status, dtype: int64 **************************************************
Define function to plot histogram and boxplot together on the same scale
# For any numerical variable, it is important to check central tendency and dispersion.
# Define a function to create a boxplot and histogram for an input variable (numerical column).
# Plot the boxplot and histogram on the same scale.
# This function definition was provided by Great Learning
def histogram_boxplot(data, feature, figsize=(12, 7), kde=True, bins=None):
"""
Boxplot and histogram on same scale
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to show the density curve (default True)
bins: number of b ins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
Define function for labeled barplot
# function to create labeled barplots provided by Great Learning
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=((count + 1) * 2, 5))
else:
plt.figure(figsize=((n + 1) * 2, 5))
plt.xticks(rotation=45, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="hls",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
# define horizontal labeled barplot for when there are many categories
def horiz_labeled_barplot(data, feature, n=None):
"""
Horizontal barplot with count and percentage in the bar/on the side
data: dataframe
feature: dataframe column
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count, (count * 0.6)))
else:
plt.figure(figsize=(n, (n * 0.6)))
ax = sns.countplot(
data=data,
y=feature,
palette="hls",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
width = p.get_width()
height = p.get_height()
label = "{:1.0f} | {:.2f}%".format(width, 100 * width / total)
x = width
y = p.get_y() + height / 2
ax.annotate(
label, (x, y), ha="left", va="center", size=9,
)
plt.show() # show the plot
# plot histogram and boxplot of number of adults
histogram_boxplot(df, "no_of_adults")
plt.title("Number of Adults")
plt.show()
# plot labeled barplot of number of adults, as this is discrete
labeled_barplot(df, "no_of_adults", perc=True)
plt.show()
# plot histogram and boxplot of number of children
histogram_boxplot(df, "no_of_children")
plt.title("Number of Children")
plt.show()
# plot labeled barplot of number of children, as these values are discrete
labeled_barplot(df, "no_of_children")
plt.savefig("child_bar.jpg", bbox_inches="tight")
plt.show()
<Figure size 432x288 with 0 Axes>
# plot histogram and boxplot of number of weekend nights
histogram_boxplot(df, "no_of_weekend_nights")
plt.title("Number of Weekend Nights")
plt.show()
# plot labeled barplot of number of weekend nights, as values are discrete
labeled_barplot(df, "no_of_weekend_nights", perc=True)
plt.savefig("weekend_bar.jpg", bbox_inches="tight")
plt.show()
<Figure size 432x288 with 0 Axes>
# plot histogram and boxplot of number of week nights
histogram_boxplot(df, "no_of_week_nights")
plt.title("Number of Week Nights")
plt.show()
# plot labeled barplot of number of week nights, as values are discrete
labeled_barplot(df, "no_of_week_nights")
plt.show()
# plot labeled barplot of number of week nights, as values are discrete
horiz_labeled_barplot(df, "no_of_week_nights")
plt.savefig("weeknights_bar.jpg", bbox_inches="tight")
plt.show()
<Figure size 432x288 with 0 Axes>
# plot histogram and boxplot of lead time
histogram_boxplot(df, "lead_time")
plt.title("Number of Days Between Booking and Arrival Date")
plt.xlabel("lead time (in days)")
plt.savefig("lead_time_hist_box.jpg", bbox_inches="tight")
plt.show()
# plot histogram and boxplot of arrival year
histogram_boxplot(df, "arrival_year")
plt.title("Year of Arrival Date")
plt.show()
# plot labeled barplot of arrival year
labeled_barplot(df, "arrival_year", perc=True)
plt.savefig("year_bar.jpg", bbox_inches="tight")
plt.show()
<Figure size 432x288 with 0 Axes>
# plot histogram and boxplot of arrival month
histogram_boxplot(df, "arrival_month")
plt.title("Month of Arrival Date")
plt.show()
# plot labeled barplot of arrival month
labeled_barplot(df, "arrival_month")
plt.show()
# plot labeled barplot of arrival month
horiz_labeled_barplot(df, "arrival_month")
plt.show()
# plot histogram and boxplot of arrival date
histogram_boxplot(df, "arrival_date")
plt.title("Arrival Date")
plt.savefig("date_hist_box.jpg", bbox_inches="tight")
plt.show()
# plot histogram and boxplot of number of previous cancellations
histogram_boxplot(df, "no_of_previous_cancellations")
plt.title("Number of Previous Cancellations")
plt.show()
# plot barplot of number of previous cancellations
labeled_barplot(df, "no_of_previous_cancellations", perc=True)
plt.show()
# plot histogram and boxplot of number of previous bookings not canceled
histogram_boxplot(df, "no_of_previous_bookings_not_canceled")
plt.title("Number of Previous Bookings Not Canceled")
plt.savefig("prev_book_hist_box.jpg", bbox_inches="tight")
plt.show()
# plot histogram and boxplot of average price per day of the reservation
histogram_boxplot(df, "avg_price_per_room")
plt.title("Average Price Per Day of the Reservation")
plt.xlabel("price per day (in euros)")
plt.savefig("avg_price_hist_box.jpg", bbox_inches="tight")
plt.show()
# plot histogram and boxplot of number of special requests
histogram_boxplot(df, "no_of_special_requests")
plt.title("Number of Special Requests")
plt.show()
# plot barplot of number of special requests
labeled_barplot(df, "no_of_special_requests", perc=True)
plt.show()
# plot barplot of type of meal plan
labeled_barplot(df, "type_of_meal_plan", perc=True)
plt.show()
# plot barplot of car parking space required
labeled_barplot(df, "required_car_parking_space", perc=True)
plt.show()
# plot barplot of room type reserved
labeled_barplot(df, "room_type_reserved")
plt.show()
# plot barplot of market segment designation
labeled_barplot(df, "market_segment_type", perc=True)
plt.show()
# plot barplot of repeated guest
labeled_barplot(df, "repeated_guest", perc=True)
plt.show()
# plot barplot of booking status
labeled_barplot(df, "booking_status", perc=True)
plt.show()
cols_list = df.select_dtypes(include=np.number).columns.tolist()
# correlation heatmap
plt.figure(figsize=(15, 9), dpi=300)
sns.heatmap(
df[cols_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral"
)
plt.savefig("corr_heatmap.jpg", bbox_inches="tight")
plt.show()
repeated_guest and no_of_previous_cancellations and no_of_previous_bookings_not_canceled, which makes sense as only repeat guests might have previous bookings in the first placeno_of_previous_cancellations and no_of_previous_bookings_not_canceledavg_price_per_room and both no_of_adults and no_of_children. As more people stay in a room, the room may have to be larger to accommodate more people, and therefore be more expensive.# look at pairwise relationships of numerical variables
cols_list = df.select_dtypes(include=np.number).columns.tolist()
# remove parking space and repeated guest as they are binary values
cols_to_remove = ["required_car_parking_space", "repeated_guest"]
for col_name in cols_to_remove:
cols_list.remove(col_name)
sns.pairplot(data=df[cols_list], diag_kind="kde")
plt.show()
# look at pairwise relationship between numerical variables with hue of booking status
sns.pairplot(
df, hue="booking_status",
)
plt.savefig("pairplot.jpg", bbox_inches="tight")
plt.show()
### function to plot distributions wrt target
def distribution_plot_wrt_target(data, predictor, target):
fig, axs = plt.subplots(2, 2, figsize=(12, 10))
target_uniq = data[target].unique()
axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
sns.histplot(
data=data[data[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="teal",
stat="density",
)
axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
sns.histplot(
data=data[data[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="orange",
stat="density",
)
axs[1, 0].set_title("Boxplot w.r.t target")
sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")
axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
sns.boxplot(
data=data,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="gist_rainbow",
)
plt.tight_layout()
plt.show()
# plot number of nights (weekend/weeknights) vs booking status
cols = data[["no_of_weekend_nights", "no_of_week_nights"]].columns.tolist()
plt.figure(figsize=(10, 5))
for i, variable in enumerate(cols):
plt.subplot(1, 2, i + 1)
sns.boxplot(data["booking_status"], data[variable], palette="hls")
plt.tight_layout()
plt.title(variable)
plt.show()
# plot dist of lead time w.r.t. booking status with/without outliers
distribution_plot_wrt_target(df, "lead_time", "booking_status")
# plot dist of average price per room w.r.t. booking status, with/without outliers
distribution_plot_wrt_target(df, "avg_price_per_room", "booking_status")
# function to plot stacked bar chart
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 6))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
# plot number of adults vs booking status
stacked_barplot(df, "no_of_adults", "booking_status")
booking_status Canceled Not_Canceled All no_of_adults All 14485 27992 42477 2 10996 20013 31009 3 1813 2216 4029 1 1589 5638 7227 0 76 108 184 4 11 17 28 ------------------------------------------------------------------------------------------------------------------------
# plot number of children vs booking status
stacked_barplot(df, "no_of_children", "booking_status")
booking_status Canceled Not_Canceled All no_of_children All 14485 27992 42477 0 12578 25630 38208 1 1009 1548 2557 2 883 787 1670 3 14 25 39 9 1 1 2 10 0 1 1 ------------------------------------------------------------------------------------------------------------------------
# plot number of weekend nights vs booking status
stacked_barplot(df, "no_of_weekend_nights", "booking_status")
booking_status Canceled Not_Canceled All no_of_weekend_nights All 14485 27992 42477 0 5628 12102 17730 2 4417 7570 11987 1 4130 8130 12260 4 148 68 216 3 117 103 220 5 21 9 30 6 21 10 31 8 2 0 2 7 1 0 1 ------------------------------------------------------------------------------------------------------------------------
# plot number of week nights vs booking status
stacked_barplot(df, "no_of_week_nights", "booking_status")
booking_status Canceled Not_Canceled All no_of_week_nights All 14485 27992 42477 2 3979 7785 11764 3 3483 6177 9660 1 3038 7868 10906 4 1704 2432 4136 5 1104 1401 2505 0 689 2009 2698 6 161 140 301 7 90 75 165 10 79 15 94 8 74 47 121 9 29 19 48 11 17 3 20 12 11 5 16 15 8 6 14 13 7 2 9 14 5 5 10 16 5 2 7 17 2 1 3 ------------------------------------------------------------------------------------------------------------------------
# plot type of meal plan vs booking status
stacked_barplot(df, "type_of_meal_plan", "booking_status")
booking_status Canceled Not_Canceled All type_of_meal_plan All 14485 27992 42477 Meal Plan 1 10509 21283 31792 Not Selected 3118 5576 8694 Meal Plan 2 857 1126 1983 Meal Plan 3 1 7 8 ------------------------------------------------------------------------------------------------------------------------
# plot room type reserved vs booking status
stacked_barplot(df, "room_type_reserved", "booking_status")
booking_status Canceled Not_Canceled All room_type_reserved All 14485 27992 42477 Room_Type 1 9223 20429 29652 Room_Type 4 3683 5674 9357 Room_Type 6 826 711 1537 Room_Type 5 367 537 904 Room_Type 2 274 441 715 Room_Type 7 110 196 306 Room_Type 3 2 4 6 ------------------------------------------------------------------------------------------------------------------------
# plot room type vs average price per room
g = sns.catplot(data=df, x="room_type_reserved", y="avg_price_per_room")
g.set_xticklabels(rotation=80)
<seaborn.axisgrid.FacetGrid at 0x12797c2e0>
# plot room type vs average price per room vs status
g = sns.catplot(
data=df,
x="room_type_reserved",
y="avg_price_per_room",
hue="booking_status",
alpha=0.5,
)
g.set_xticklabels(rotation=80)
plt.savefig("price_room_status.jpg", bbox_inches="tight")
# plot room type vs average price per room
sns.boxplot(data=df, x="avg_price_per_room", y="room_type_reserved")
<AxesSubplot:xlabel='avg_price_per_room', ylabel='room_type_reserved'>
# plot room type vs average price per room vs booking status
sns.boxplot(
data=df, x="avg_price_per_room", y="room_type_reserved", hue="booking_status"
)
<AxesSubplot:xlabel='avg_price_per_room', ylabel='room_type_reserved'>
# median of average price per room grouped by room type
df.groupby(by="room_type_reserved")["avg_price_per_room"].median()
room_type_reserved Room_Type 1 96.300 Room_Type 2 86.630 Room_Type 3 95.375 Room_Type 4 133.200 Room_Type 5 162.000 Room_Type 6 190.000 Room_Type 7 211.705 Name: avg_price_per_room, dtype: float64
# plot arrival year vs booking status
stacked_barplot(df, "arrival_year", "booking_status")
booking_status Canceled Not_Canceled All arrival_year All 14485 27992 42477 2019 7045 9507 16552 2018 6965 15089 22054 2017 475 3396 3871 ------------------------------------------------------------------------------------------------------------------------
# plot arrival month vs booking status
stacked_barplot(df, "arrival_month", "booking_status")
booking_status Canceled Not_Canceled All arrival_month All 14485 27992 42477 8 2475 2833 5308 7 2240 2482 4722 5 1674 2669 4343 4 1627 2594 4221 6 1583 2481 4064 3 1195 2842 4037 10 918 2276 3194 9 887 2162 3049 2 796 2079 2875 11 496 1687 2183 12 340 2037 2377 1 254 1850 2104 ------------------------------------------------------------------------------------------------------------------------
# look at value counts of each arrival month, group by booking status
df.groupby(by="booking_status")["arrival_month"].value_counts()
booking_status arrival_month
Canceled 8 2475
7 2240
5 1674
4 1627
6 1583
3 1195
10 918
9 887
2 796
11 496
12 340
1 254
Not_Canceled 3 2842
8 2833
5 2669
4 2594
7 2482
6 2481
10 2276
9 2162
2 2079
12 2037
1 1850
11 1687
Name: arrival_month, dtype: int64
# plot market segment type vs booking status
stacked_barplot(df, "market_segment_type", "booking_status")
booking_status Canceled Not_Canceled All market_segment_type All 14485 27992 42477 Online 13481 20605 34086 Offline 804 4973 5777 Corporate 167 1772 1939 Aviation 33 162 195 Complementary 0 480 480 ------------------------------------------------------------------------------------------------------------------------
# plot number of special requests vs market segment type vs booking status
sns.boxplot(
data=df, x="no_of_special_requests", y="market_segment_type", hue="booking_status"
)
plt.legend(loc="lower right")
<matplotlib.legend.Legend at 0x14c70eaf0>
# plot price vs market segment type
sns.boxplot(data=df, x="avg_price_per_room", y="market_segment_type")
<AxesSubplot:xlabel='avg_price_per_room', ylabel='market_segment_type'>
# plot lead time vs market segment type
sns.boxplot(data=df, x="lead_time", y="market_segment_type")
<AxesSubplot:xlabel='lead_time', ylabel='market_segment_type'>
# plot lead time vs price vs booking status vs market segment type
sns.relplot(
data=df,
x="lead_time",
y="avg_price_per_room",
col="market_segment_type",
col_wrap=3,
height=4,
hue="booking_status",
style="booking_status",
kind="scatter",
alpha=0.5,
)
<seaborn.axisgrid.FacetGrid at 0x149836a30>
# find value counts of booking status for free rooms booked Online
df[(df["market_segment_type"] == "Online") & (df["avg_price_per_room"] == 0)][
"booking_status"
].value_counts()
Not_Canceled 93 Canceled 8 Name: booking_status, dtype: int64
# plot lead time vs price vs market segment type
g = sns.FacetGrid(df, col="market_segment_type", col_wrap=3, height=4)
g.map(sns.lineplot, "lead_time", "avg_price_per_room", ci=None)
<seaborn.axisgrid.FacetGrid at 0x121d08220>
# plot repeated guests vs booking status
stacked_barplot(df, "repeated_guest", "booking_status")
booking_status Canceled Not_Canceled All repeated_guest All 14485 27992 42477 0 14475 26695 41170 1 10 1297 1307 ------------------------------------------------------------------------------------------------------------------------
# plot number of previous cancellations vs booking status
stacked_barplot(df, "no_of_previous_cancellations", "booking_status")
booking_status Canceled Not_Canceled All no_of_previous_cancellations All 14485 27992 42477 0 14475 27560 42035 1 8 239 247 3 1 46 47 13 1 0 1 2 0 66 66 4 0 24 24 5 0 16 16 6 0 16 16 11 0 25 25 ------------------------------------------------------------------------------------------------------------------------
# plot number of special requests vs booking status
stacked_barplot(df, "no_of_special_requests", "booking_status")
booking_status Canceled Not_Canceled All no_of_special_requests All 14485 27992 42477 0 8750 10430 19180 1 4346 11187 15533 2 1389 4980 6369 3 0 1229 1229 4 0 150 150 5 0 16 16 ------------------------------------------------------------------------------------------------------------------------
f1_score should be maximized for better chances at identifying both the classes correctly# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_statsmodels(
model, predictors, target, threshold=0.5
):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
# checking which probabilities are greater than threshold
pred_temp = model.predict(predictors) > threshold
# rounding off the above values to get classes
pred = np.round(pred_temp)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
# defining a function to plot the confusion_matrix of a classification model
def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
y_pred = model.predict(predictors) > threshold
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
# change month from numerical to categorical
import calendar
df["month"] = df["arrival_month"].apply(lambda x: calendar.month_name[x])
# drop arrival_month
df.drop("arrival_month", axis=1, inplace=True)
df.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled | October |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled | November |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled | February |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled | May |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 | Canceled | July |
# change the column name from booking_status to booking_canceled
df1 = df.rename(columns={"booking_status": "booking_canceled"})
df1.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_canceled | month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled | October |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled | November |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled | February |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled | May |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 | Canceled | July |
# create change in structure from strings to binary
replaceStatus = {"booking_canceled": {"Not_Canceled": 0, "Canceled": 1}}
# apply the replacement
df1 = df1.replace(replaceStatus)
df1.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_canceled | month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | 0 | October |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | 0 | November |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | 1 | February |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | 1 | May |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 | 1 | July |
# split data
# drop arrival_year, as we cannot go back in time--only forward
X = df1.drop(["booking_canceled", "arrival_year"], axis=1)
y = df1["booking_canceled"]
# creating dummy variables
X = pd.get_dummies(X, drop_first=True)
# splitting in training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
# check training & test sets to make sure they are comparable
print("Shape of training set:", X_train.shape)
print("Shape of test set:", X_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Shape of training set: (29733, 36) Shape of test set: (12744, 36) Percentage of classes in training set: 0 0.661588 1 0.338412 Name: booking_canceled, dtype: float64 Percentage of classes in test set: 0 0.652935 1 0.347065 Name: booking_canceled, dtype: float64
# There are different solvers available in Sklearn logistic regression
# The newton-cg solver is faster for high-dimensional data
model = LogisticRegression(solver="newton-cg", random_state=1)
lg = model.fit(X_train, y_train)
# predicting on training set
y_pred_train = lg.predict(X_train)
log_reg_model_train_perf_sklearn = model_performance_classification_statsmodels(
lg, X_train, y_train
)
print("Training set performance:")
log_reg_model_train_perf_sklearn
Training set performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.794202 | 0.623932 | 0.728898 | 0.672343 |
# predicting on the test set
y_pred_test = lg.predict(X_test)
log_reg_model_test_perf_sklearn = model_performance_classification_statsmodels(
lg, X_test, y_test
)
print("Test set performance:")
log_reg_model_test_perf_sklearn
Test set performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.790333 | 0.620393 | 0.734279 | 0.672549 |
Some features are very skewed and may behave better on a different scale. We can try to transform continuous variables, so we will check lead_time and avg_price_per_room.
# plot columns that are skewed
cols_to_trans = ["lead_time", "avg_price_per_room"]
for colname in cols_to_trans:
plt.hist(df[colname], bins=50)
plt.title(colname)
plt.show()
print(np.sum(df[colname] <= 0)) # print number of values less than or equal to zero
1588
542
sqrt and np.arcsinh.# plot log transformation of lead time
plt.hist(np.log(df["lead_time"] + 0.01), 50)
plt.title("log(lead_time + 1)")
plt.show()
# plot inverse hyperbolic sine transformation of lead time
plt.hist(np.arcsinh(df["lead_time"]), 50)
plt.title("arcsinh(lead_time)")
plt.show()
# plot square root transformation of lead time
plt.hist(np.sqrt(df["lead_time"]), 50)
plt.title("sqrt[lead_time]")
plt.show()
# plot log transformation of average price per room
plt.hist(np.log(df["avg_price_per_room"] + 0.01), 50)
plt.title("log(avg_price_per_room + 1)")
plt.show()
# plot inverse hyperbolic sine transformation of average price per room
plt.hist(np.arcsinh(df["avg_price_per_room"]), 50)
plt.title("arcsinh(avg_price_per_room)")
plt.show()
# plot square root transformation of average price per room
plt.hist(np.sqrt(df["avg_price_per_room"]), 50)
plt.title("sqrt[avg_price_per_room]")
plt.show()
We will choose square root transformation because it makes the data slightly less skewed without having to add a value to the data and may be more easily interpretable than the other transformations
# create a new copy of the data and apply square root transformation
X1 = df1.copy()
for colname in cols_to_trans:
X1[colname + "_sqrt"] = np.sqrt(X1[colname])
X1.drop(cols_to_trans, axis=1, inplace=True)
X1.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | arrival_year | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | no_of_special_requests | booking_canceled | month | lead_time_sqrt | avg_price_per_room_sqrt | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 2017 | 2 | Offline | 0 | 0 | 0 | 0 | 0 | October | 14.966630 | 8.062258 |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 2018 | 6 | Online | 0 | 0 | 0 | 1 | 0 | November | 2.236068 | 10.328601 |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 2018 | 28 | Online | 0 | 0 | 0 | 0 | 1 | February | 1.000000 | 7.745967 |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 2018 | 20 | Online | 0 | 0 | 0 | 0 | 1 | May | 14.525839 | 10.000000 |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 2019 | 13 | Online | 0 | 0 | 0 | 2 | 1 | July | 16.643317 | 9.439280 |
# split data
X2 = X1.drop(["booking_canceled",], axis=1,)
y = X1["booking_canceled"]
# creating dummy variables
X2 = pd.get_dummies(X2, drop_first=True)
# splitting in training and test set
X2_train, X2_test, y_train, y_test = train_test_split(
X2, y, test_size=0.3, random_state=1
)
# check training & test sets to make sure they are comparable
print("Shape of training set:", X2_train.shape)
print("Shape of test set:", X2_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Shape of training set: (29733, 37) Shape of test set: (12744, 37) Percentage of classes in training set: 0 0.661588 1 0.338412 Name: booking_canceled, dtype: float64 Percentage of classes in test set: 0 0.652935 1 0.347065 Name: booking_canceled, dtype: float64
# There are different solvers available in Sklearn logistic regression
# The newton-cg solver is faster for high-dimensional data
model = LogisticRegression(solver="newton-cg", random_state=1)
lg4 = model.fit(X2_train, y_train)
# predicting on training set
y_pred_train2 = lg4.predict(X2_train)
# training performance
log_reg_model_train_perf_sqrt = model_performance_classification_statsmodels(
lg4, X2_train, y_train
)
print("Training performance:")
log_reg_model_train_perf_sqrt
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.793226 | 0.638243 | 0.719149 | 0.676285 |
# predicting on the test set
y_pred_test2 = lg4.predict(X2_test)
log_reg_model_test_perf_sqrt = model_performance_classification_statsmodels(
lg4, X2_test, y_test
)
print("Test performance:")
log_reg_model_test_perf_sqrt
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.789626 | 0.63102 | 0.726823 | 0.675542 |
# split data
X = df1.drop(["booking_canceled", "arrival_year"], axis=1)
y = df1["booking_canceled"]
# creating dummy variables
X = pd.get_dummies(X, drop_first=True)
# adding constant
X = sm.add_constant(X)
# splitting to training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
# fit logit regression
logit = sm.Logit(y_train, X_train.astype(float))
lg = logit.fit(disp=False)
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_canceled No. Observations: 29733
Model: Logit Df Residuals: 29696
Method: MLE Df Model: 36
Date: Sat, 23 Oct 2021 Pseudo R-squ.: 0.3404
Time: 01:48:54 Log-Likelihood: -12551.
converged: False LL-Null: -19028.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const -3.0021 0.256 -11.743 0.000 -3.503 -2.501
no_of_adults -0.0351 0.036 -0.971 0.332 -0.106 0.036
no_of_children 0.0860 0.051 1.683 0.092 -0.014 0.186
no_of_weekend_nights 0.0471 0.018 2.580 0.010 0.011 0.083
no_of_week_nights 0.1054 0.011 9.773 0.000 0.084 0.127
required_car_parking_space -1.5208 0.116 -13.084 0.000 -1.749 -1.293
lead_time 0.0169 0.000 60.913 0.000 0.016 0.017
arrival_date -0.0029 0.002 -1.640 0.101 -0.006 0.001
repeated_guest -3.6172 0.829 -4.363 0.000 -5.242 -1.992
no_of_previous_cancellations 0.3620 0.116 3.108 0.002 0.134 0.590
no_of_previous_bookings_not_canceled -0.0489 0.093 -0.527 0.598 -0.231 0.133
avg_price_per_room 0.0182 0.001 25.938 0.000 0.017 0.020
no_of_special_requests -1.3296 0.024 -55.214 0.000 -1.377 -1.282
type_of_meal_plan_Meal Plan 2 -0.2058 0.081 -2.553 0.011 -0.364 -0.048
type_of_meal_plan_Meal Plan 3 13.4858 473.293 0.028 0.977 -914.152 941.124
type_of_meal_plan_Not Selected 0.3447 0.043 8.104 0.000 0.261 0.428
room_type_reserved_Room_Type 2 -0.0680 0.131 -0.521 0.603 -0.324 0.188
room_type_reserved_Room_Type 3 0.4909 1.407 0.349 0.727 -2.267 3.249
room_type_reserved_Room_Type 4 -0.1610 0.045 -3.585 0.000 -0.249 -0.073
room_type_reserved_Room_Type 5 -0.4624 0.113 -4.083 0.000 -0.684 -0.240
room_type_reserved_Room_Type 6 -0.5564 0.125 -4.465 0.000 -0.801 -0.312
room_type_reserved_Room_Type 7 -1.1662 0.208 -5.605 0.000 -1.574 -0.758
market_segment_type_Complementary -22.3099 608.873 -0.037 0.971 -1215.679 1171.060
market_segment_type_Corporate -0.6767 0.266 -2.546 0.011 -1.198 -0.156
market_segment_type_Offline -2.3168 0.252 -9.211 0.000 -2.810 -1.824
market_segment_type_Online -0.0746 0.245 -0.305 0.761 -0.555 0.406
month_August -0.0340 0.065 -0.527 0.599 -0.161 0.093
month_December -1.1974 0.104 -11.516 0.000 -1.401 -0.994
month_February 0.6149 0.077 7.962 0.000 0.464 0.766
month_January -0.7546 0.107 -7.031 0.000 -0.965 -0.544
month_July -0.0992 0.065 -1.522 0.128 -0.227 0.028
month_June -0.1979 0.067 -2.932 0.003 -0.330 -0.066
month_March 0.2831 0.069 4.118 0.000 0.148 0.418
month_May -0.2806 0.067 -4.204 0.000 -0.411 -0.150
month_November 0.3615 0.088 4.093 0.000 0.188 0.535
month_October -0.0486 0.077 -0.634 0.526 -0.199 0.102
month_September -0.1344 0.077 -1.738 0.082 -0.286 0.017
========================================================================================================
# training performance
print("Training performance:")
model_performance_classification_statsmodels(lg, X_train, y_train)
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.794303 | 0.624329 | 0.728939 | 0.672591 |
# vif before feature selection
vif_series0 = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("Series before feature selection: \n\n{}\n".format(vif_series0))
Series before feature selection: const 268.323171 no_of_adults 1.484452 no_of_children 2.179179 no_of_weekend_nights 1.088629 no_of_week_nights 1.140954 required_car_parking_space 1.038128 lead_time 1.309824 arrival_date 1.007115 repeated_guest 2.046807 no_of_previous_cancellations 1.598834 no_of_previous_bookings_not_canceled 1.979028 avg_price_per_room 2.921497 no_of_special_requests 1.116389 type_of_meal_plan_Meal Plan 2 1.106498 type_of_meal_plan_Meal Plan 3 1.025056 type_of_meal_plan_Not Selected 1.280225 room_type_reserved_Room_Type 2 1.101009 room_type_reserved_Room_Type 3 1.001726 room_type_reserved_Room_Type 4 1.444559 room_type_reserved_Room_Type 5 1.129229 room_type_reserved_Room_Type 6 2.170247 room_type_reserved_Room_Type 7 1.177996 market_segment_type_Complementary 3.773144 market_segment_type_Corporate 11.491454 market_segment_type_Offline 29.868965 market_segment_type_Online 39.199890 month_August 2.036833 month_December 1.553977 month_February 1.677861 month_January 1.524442 month_July 1.926293 month_June 1.814071 month_March 1.867575 month_May 1.875275 month_November 1.494463 month_October 1.654916 month_September 1.627122 dtype: float64
market_segment_type_Corporate, market_segment_type_Offline, and market_segment_type_Online have VIF greater than 10. # drop column with high VIF
col_to_drop = "market_segment_type_Online"
x_train1 = X_train.loc[:, ~X_train.columns.str.startswith(col_to_drop)]
x_test1 = X_test.loc[:, ~X_test.columns.str.startswith(col_to_drop)]
# check VIF again
vif_series1 = pd.Series(
[variance_inflation_factor(x_train1.values, i) for i in range(x_train1.shape[1])],
index=x_train1.columns,
dtype=float,
)
print("Series after dropping {}: \n\n{}\n".format(col_to_drop, vif_series1))
Series after dropping market_segment_type_Online: const 50.392643 no_of_adults 1.465521 no_of_children 2.178552 no_of_weekend_nights 1.088563 no_of_week_nights 1.140885 required_car_parking_space 1.038114 lead_time 1.305262 arrival_date 1.007085 repeated_guest 2.017450 no_of_previous_cancellations 1.598475 no_of_previous_bookings_not_canceled 1.977009 avg_price_per_room 2.918400 no_of_special_requests 1.113433 type_of_meal_plan_Meal Plan 2 1.106498 type_of_meal_plan_Meal Plan 3 1.025055 type_of_meal_plan_Not Selected 1.277363 room_type_reserved_Room_Type 2 1.100732 room_type_reserved_Room_Type 3 1.001725 room_type_reserved_Room_Type 4 1.439932 room_type_reserved_Room_Type 5 1.128680 room_type_reserved_Room_Type 6 2.169777 room_type_reserved_Room_Type 7 1.177654 market_segment_type_Complementary 1.411749 market_segment_type_Corporate 1.720992 market_segment_type_Offline 1.310837 month_August 2.036686 month_December 1.553080 month_February 1.675961 month_January 1.523384 month_July 1.926179 month_June 1.813890 month_March 1.866425 month_May 1.875241 month_November 1.494019 month_October 1.654910 month_September 1.627020 dtype: float64
market_segment_type_Online has fixed multicollinearity# fit logit regression
logit2 = sm.Logit(y_train, x_train1.astype(float))
lg2 = logit2.fit()
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.422127
Iterations: 35
# training performance
print("Training performance:")
model_performance_classification_statsmodels(lg2, x_train1, y_train)
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.794269 | 0.624329 | 0.728855 | 0.672555 |
print(lg2.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_canceled No. Observations: 29733
Model: Logit Df Residuals: 29697
Method: MLE Df Model: 35
Date: Sat, 23 Oct 2021 Pseudo R-squ.: 0.3404
Time: 01:49:02 Log-Likelihood: -12551.
converged: False LL-Null: -19028.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const -3.0720 0.113 -27.136 0.000 -3.294 -2.850
no_of_adults -0.0364 0.036 -1.012 0.311 -0.107 0.034
no_of_children 0.0857 0.051 1.677 0.094 -0.014 0.186
no_of_weekend_nights 0.0471 0.018 2.580 0.010 0.011 0.083
no_of_week_nights 0.1055 0.011 9.775 0.000 0.084 0.127
required_car_parking_space -1.5204 0.116 -13.081 0.000 -1.748 -1.293
lead_time 0.0169 0.000 61.029 0.000 0.016 0.017
arrival_date -0.0029 0.002 -1.639 0.101 -0.006 0.001
repeated_guest -3.6080 0.829 -4.351 0.000 -5.233 -1.983
no_of_previous_cancellations 0.3613 0.116 3.103 0.002 0.133 0.590
no_of_previous_bookings_not_canceled -0.0492 0.093 -0.529 0.597 -0.231 0.133
avg_price_per_room 0.0182 0.001 25.953 0.000 0.017 0.020
no_of_special_requests -1.3298 0.024 -55.239 0.000 -1.377 -1.283
type_of_meal_plan_Meal Plan 2 -0.2057 0.081 -2.552 0.011 -0.364 -0.048
type_of_meal_plan_Meal Plan 3 23.2798 6.34e+04 0.000 1.000 -1.24e+05 1.24e+05
type_of_meal_plan_Not Selected 0.3440 0.042 8.100 0.000 0.261 0.427
room_type_reserved_Room_Type 2 -0.0686 0.131 -0.526 0.599 -0.325 0.187
room_type_reserved_Room_Type 3 0.4906 1.407 0.349 0.727 -2.267 3.249
room_type_reserved_Room_Type 4 -0.1603 0.045 -3.573 0.000 -0.248 -0.072
room_type_reserved_Room_Type 5 -0.4616 0.113 -4.077 0.000 -0.683 -0.240
room_type_reserved_Room_Type 6 -0.5556 0.125 -4.460 0.000 -0.800 -0.311
room_type_reserved_Room_Type 7 -1.1646 0.208 -5.600 0.000 -1.572 -0.757
market_segment_type_Complementary -32.9493 6.34e+04 -0.001 1.000 -1.24e+05 1.24e+05
market_segment_type_Corporate -0.6041 0.118 -5.135 0.000 -0.835 -0.374
market_segment_type_Offline -2.2428 0.065 -34.596 0.000 -2.370 -2.116
month_August -0.0342 0.065 -0.530 0.596 -0.161 0.092
month_December -1.1979 0.104 -11.523 0.000 -1.402 -0.994
month_February 0.6139 0.077 7.957 0.000 0.463 0.765
month_January -0.7554 0.107 -7.042 0.000 -0.966 -0.545
month_July -0.0994 0.065 -1.526 0.127 -0.227 0.028
month_June -0.1979 0.067 -2.933 0.003 -0.330 -0.066
month_March 0.2823 0.069 4.109 0.000 0.148 0.417
month_May -0.2807 0.067 -4.205 0.000 -0.412 -0.150
month_November 0.3610 0.088 4.088 0.000 0.188 0.534
month_October -0.0486 0.077 -0.634 0.526 -0.199 0.102
month_September -0.1346 0.077 -1.740 0.082 -0.286 0.017
========================================================================================================
# run a loop to drop variables with high p-value
# initial list of columns
cols = x_train1.columns.tolist()
# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
# defining the train set
X_train_aux = x_train1[cols]
# fitting the model
model = sm.Logit(y_train, X_train_aux).fit(disp=False)
# getting the p-values and the maximum p-value
p_values = model.pvalues
max_p_value = max(p_values)
# name of the variable with maximum p-value
feature_with_p_max = p_values.idxmax()
if max_p_value > 0.05:
cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)
['const', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'repeated_guest', 'no_of_previous_cancellations', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'month_December', 'month_February', 'month_January', 'month_June', 'month_March', 'month_May', 'month_November']
# new training set with only selected features
x_train2 = x_train1[selected_features]
# fit logit regression to new training set
logit3 = sm.Logit(y_train, x_train2.astype(float))
lg3 = logit3.fit()
print(lg3.summary())
Optimization terminated successfully.
Current function value: 0.422722
Iterations 10
Logit Regression Results
==============================================================================
Dep. Variable: booking_canceled No. Observations: 29733
Model: Logit Df Residuals: 29709
Method: MLE Df Model: 23
Date: Sat, 23 Oct 2021 Pseudo R-squ.: 0.3395
Time: 01:49:06 Log-Likelihood: -12569.
converged: True LL-Null: -19028.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
const -3.2630 0.095 -34.497 0.000 -3.448 -3.078
no_of_weekend_nights 0.0483 0.018 2.650 0.008 0.013 0.084
no_of_week_nights 0.1067 0.011 9.911 0.000 0.086 0.128
required_car_parking_space -1.5247 0.116 -13.129 0.000 -1.752 -1.297
lead_time 0.0169 0.000 63.107 0.000 0.016 0.017
repeated_guest -3.8249 0.775 -4.934 0.000 -5.344 -2.305
no_of_previous_cancellations 0.3493 0.113 3.081 0.002 0.127 0.571
avg_price_per_room 0.0184 0.001 28.341 0.000 0.017 0.020
no_of_special_requests -1.3307 0.024 -55.500 0.000 -1.378 -1.284
type_of_meal_plan_Meal Plan 2 -0.2095 0.080 -2.609 0.009 -0.367 -0.052
type_of_meal_plan_Not Selected 0.3415 0.042 8.170 0.000 0.260 0.423
room_type_reserved_Room_Type 4 -0.1850 0.043 -4.277 0.000 -0.270 -0.100
room_type_reserved_Room_Type 5 -0.4936 0.112 -4.392 0.000 -0.714 -0.273
room_type_reserved_Room_Type 6 -0.4391 0.098 -4.467 0.000 -0.632 -0.246
room_type_reserved_Room_Type 7 -1.1281 0.202 -5.585 0.000 -1.524 -0.732
market_segment_type_Corporate -0.5938 0.116 -5.119 0.000 -0.821 -0.366
market_segment_type_Offline -2.2407 0.064 -34.823 0.000 -2.367 -2.115
month_December -1.1532 0.096 -12.024 0.000 -1.341 -0.965
month_February 0.6751 0.067 10.083 0.000 0.544 0.806
month_January -0.6895 0.100 -6.898 0.000 -0.885 -0.494
month_June -0.1441 0.053 -2.693 0.007 -0.249 -0.039
month_March 0.3382 0.057 5.949 0.000 0.227 0.450
month_May -0.2256 0.052 -4.329 0.000 -0.328 -0.123
month_November 0.4179 0.079 5.290 0.000 0.263 0.573
==================================================================================================
Note: Removing variables with high $p$-values has enabled optimization to terminate successfully. Previously, the maximum number of iterations was exceeded.
Now no feature has $p$-value > 0.05 so we will consider the features in x_train2 as the final ones and lg3 as the final model.
# converting coefficients to odds
odds = np.exp(lg3.params)
# finding the percentage change
perc_change_odds = (np.exp(lg3.params) - 1) * 100
# removing the limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=x_train2.columns).T
| const | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | repeated_guest | no_of_previous_cancellations | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Corporate | market_segment_type_Offline | month_December | month_February | month_January | month_June | month_March | month_May | month_November | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 0.038275 | 1.049479 | 1.112612 | 0.217687 | 1.017069 | 0.021821 | 1.418057 | 1.018558 | 0.264292 | 0.810960 | 1.407088 | 0.831120 | 0.610422 | 0.644595 | 0.323654 | 0.552217 | 0.106383 | 0.315612 | 1.964228 | 0.501816 | 0.865829 | 1.402416 | 0.798063 | 1.518814 |
| Change_odd% | -96.172499 | 4.947939 | 11.261206 | -78.231349 | 1.706942 | -97.817857 | 41.805742 | 1.855848 | -73.570833 | -18.903951 | 40.708757 | -16.887973 | -38.957756 | -35.540522 | -67.634569 | -44.778271 | -89.361749 | -68.438774 | 96.422799 | -49.818352 | -13.417107 | 40.241625 | -20.193720 | 51.881435 |
no_of_weekend_nights: Holding all other features constant, increasing the booking by 1 weekend night will increase the odds of the booking being cancelled by 1.05 times or a 4.95% increaseno_of_week_nights: Holding all other features constant, increasing the booking by 1 week night will increase the odds of cancellation by 1.11 times or 11.26% increaserequired_car_parking_space: Holding all other features constant, a booking with a car parking space required decreases the odds of cancellation by 78.23%lead_time: Holding all other features constant, increasing the lead time by a day will increase the odds of cancellation 1.71%repeated_guest: Holding all other features constant, the odds of a cancellation from a repeat guest is 97.82% less likelyno_of_previous_cancellations: Holding all other features constant, increasing the number of previous cancellations by one unit will increase the odds of cancellation by 41.81%avg_price_per_room: Holding all other features constant, increasing the average price per room by 1 unit will increase the odds of cancellation by 1.86%no_of_special_requests: Holding all other features constant, increasing the number of special requests by 1 unit will decrease the odds of cancellation by 73.57%type_of_meal_plan: Holding all other features constant, selecting meal plan 2 will decrease the odds of cancellation by 18.9% while no selection of a meal will increase the odds by 40.71%type_of_room_reserved: Holding all other features constant, reservation of room types 4, 5, 6, and 7 each decrease the odds of cancellation by 16.89%, 38.96%, 35.54%, and 67.63%, respectivelymarket_segment_type: Holding all other features constant, Corporate and Offline market types will each decrease the odds of cancellation by 44.78% and 89.36%, respectivelyarrival_month: Holding all other features constant, arriving in the months of January, May, June, and December each decrease the odds of cancellation by 49.82%, 20.19%, 13.42%, and 68.44%, respectively. Arriving in the months of February, March, and November increase the odds of cancellation by 96.42%, 40.24%, and 51.88%, respectively# creating confusion matrix
confusion_matrix_statsmodels(lg3, x_train2, y_train)
# logit regression model training performance
log_reg_model_train_perf = model_performance_classification_statsmodels(
lg3, x_train2, y_train
)
print("Training performance:")
log_reg_model_train_perf
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.794673 | 0.623534 | 0.730299 | 0.672707 |
# ROC-AUC on training set
logit_roc_auc_train = roc_auc_score(y_train, lg3.predict(x_train2))
fpr, tpr, thresholds = roc_curve(y_train, lg3.predict(x_train2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area=%0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# optimal threshold as per AUC-ROC curve
# optimal cut off would be where tpr is high and fpr is low
fpr, tpr, thresholds = roc_curve(y_train, lg3.predict(x_train2))
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.32280954710436865
# creating confusion matrix
confusion_matrix_statsmodels(
lg3, x_train2, y_train, threshold=optimal_threshold_auc_roc
)
# checking performance for this model
log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg3, x_train2, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
log_reg_model_train_perf_threshold_auc_roc
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.775805 | 0.797754 | 0.634144 | 0.706602 |
Let's use Precision-Recall curve and see if we can find a better threshold
y_scores = lg3.predict(x_train2)
prec, rec, tre = precision_recall_curve(y_train, y_scores,)
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
# setting the threshold
optimal_threshold_curve = 0.43
# creating confusion matrix
confusion_matrix_statsmodels(lg3, x_train2, y_train, threshold=optimal_threshold_curve)
# training performance
log_reg_model_train_perf_threshold_curve = model_performance_classification_statsmodels(
lg3, x_train2, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.792184 | 0.697177 | 0.691337 | 0.694245 |
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf_sklearn.T,
log_reg_model_train_perf_sqrt.T,
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression sklearn",
"Logistic Regression with Sqrt Transformation",
"Logistic Regression statsmodels",
"Logistic Regression-0.32 Threshold",
"Logistic Regression-0.43 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression sklearn | Logistic Regression with Sqrt Transformation | Logistic Regression statsmodels | Logistic Regression-0.32 Threshold | Logistic Regression-0.43 Threshold | |
|---|---|---|---|---|---|
| Accuracy | 0.794202 | 0.793226 | 0.794673 | 0.775805 | 0.792184 |
| Recall | 0.623932 | 0.638243 | 0.623534 | 0.797754 | 0.697177 |
| Precision | 0.728898 | 0.719149 | 0.730299 | 0.634144 | 0.691337 |
| F1 | 0.672343 | 0.676285 | 0.672707 | 0.706602 | 0.694245 |
Dropping the columns from the test set that were dropped from the training set
x_test2 = X_test[list(x_train2.columns)]
Using model with default threshold
# creating confusion matrix
confusion_matrix_statsmodels(lg3, x_test2, y_test)
# test performance
log_reg_model_test_perf = model_performance_classification_statsmodels(
lg3, x_test2, y_test
)
print("Test performance:")
log_reg_model_test_perf
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.790097 | 0.618359 | 0.73482 | 0.671578 |
logit_roc_auc_test = roc_auc_score(y_test, lg3.predict(x_test2))
fpr, tpr, thresholds = roc_curve(y_test, lg3.predict(x_test2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area=%0.2f)" % logit_roc_auc_test)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
Using model with threshold = 0.32
# creating confusion matrix
confusion_matrix_statsmodels(lg3, x_test2, y_test, threshold=optimal_threshold_auc_roc)
# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg3, x_test2, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.771343 | 0.793579 | 0.636908 | 0.706664 |
Using model with threshold = 0.43
# creating confusion matrix
confusion_matrix_statsmodels(lg3, x_test2, y_test, threshold=optimal_threshold_curve)
# test performance
log_reg_model_test_perf_threshold_curve = model_performance_classification_statsmodels(
lg3, x_test2, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.78688 | 0.686186 | 0.695622 | 0.690872 |
# training performance summary
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression sklearn | Logistic Regression with Sqrt Transformation | Logistic Regression statsmodels | Logistic Regression-0.32 Threshold | Logistic Regression-0.43 Threshold | |
|---|---|---|---|---|---|
| Accuracy | 0.794202 | 0.793226 | 0.794673 | 0.775805 | 0.792184 |
| Recall | 0.623932 | 0.638243 | 0.623534 | 0.797754 | 0.697177 |
| Precision | 0.728898 | 0.719149 | 0.730299 | 0.634144 | 0.691337 |
| F1 | 0.672343 | 0.676285 | 0.672707 | 0.706602 | 0.694245 |
# test performance summary
models_test_comp_df = pd.concat(
[
log_reg_model_test_perf_sklearn.T,
log_reg_model_test_perf_sqrt.T,
log_reg_model_test_perf.T,
log_reg_model_test_perf_threshold_auc_roc.T,
log_reg_model_test_perf_threshold_curve.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Logistic Regression sklearn",
"Logistic Regression with Sqrt Transformation",
"Logistic Regression with statsmodels",
"Logistic Regression-0.32 Threshold",
"Logistic Regression-0.43 Threshold",
]
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Logistic Regression sklearn | Logistic Regression with Sqrt Transformation | Logistic Regression with statsmodels | Logistic Regression-0.32 Threshold | Logistic Regression-0.43 Threshold | |
|---|---|---|---|---|---|
| Accuracy | 0.790333 | 0.789626 | 0.790097 | 0.771343 | 0.786880 |
| Recall | 0.620393 | 0.631020 | 0.618359 | 0.793579 | 0.686186 |
| Precision | 0.734279 | 0.726823 | 0.734820 | 0.636908 | 0.695622 |
| F1 | 0.672549 | 0.675542 | 0.671578 | 0.706664 | 0.690872 |
# split data
X = df1.drop(["booking_canceled", "arrival_year"], axis=1)
y = df1["booking_canceled"].astype("int64")
# creating dummy variables
X = pd.get_dummies(X, drop_first=True)
# splitting in training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
# check training & test sets to make sure they are comparable
print("Shape of training set:", X_train.shape)
print("Shape of test set:", X_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Shape of training set: (29733, 36) Shape of test set: (12744, 36) Percentage of classes in training set: 0 0.661588 1 0.338412 Name: booking_canceled, dtype: float64 Percentage of classes in test set: 0 0.652935 1 0.347065 Name: booking_canceled, dtype: float64
model = DecisionTreeClassifier(
criterion="gini", class_weight="balanced", random_state=1
)
model.fit(X_train, y_train)
DecisionTreeClassifier(class_weight='balanced', random_state=1)
# confusion matrix of training data using model
confusion_matrix_statsmodels(model, X_train, y_train)
# training performance
decision_tree_perf_train = model_performance_classification_statsmodels(
model, X_train, y_train
)
print("Training performance:")
decision_tree_perf_train
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.9962 | 1.0 | 0.988894 | 0.994416 |
# confusion matrix for test set using model
confusion_matrix_statsmodels(model, X_test, y_test)
# test performance
decision_tree_perf_test = model_performance_classification_statsmodels(
model, X_test, y_test
)
print("Test performance:")
decision_tree_perf_test
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.787979 | 0.694551 | 0.694551 | 0.694551 |
# creating a list of column names
feature_names = X_train.columns.to_list()
# visualizing the tree
plt.figure(figsize=(20, 30))
out = tree.plot_tree(
model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# text report showing the rules of a decision tree -
print(tree.export_text(model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 86.50 | | | | |--- avg_price_per_room <= 202.50 | | | | | |--- repeated_guest <= 0.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- avg_price_per_room <= 59.50 | | | | | | | | |--- weights: [77.09, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 59.50 | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | |--- month_December <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | | |--- month_December > 0.50 | | | | | | | | | | | |--- weights: [24.94, 0.00] class: 0 | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | |--- weights: [30.23, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | |--- month_May <= 0.50 | | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | | | | |--- month_May > 0.50 | | | | | | | | | | |--- weights: [0.00, 4.43] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- weights: [557.75, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- lead_time <= 41.50 | | | | | | | | | |--- month_September <= 0.50 | | | | | | | | | | |--- month_July <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- month_July > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- month_September > 0.50 | | | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- lead_time > 41.50 | | | | | | | | | |--- avg_price_per_room <= 76.77 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- avg_price_per_room > 76.77 | | | | | | | | | | |--- avg_price_per_room <= 78.10 | | | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | | | | | |--- avg_price_per_room > 78.10 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | |--- repeated_guest > 0.50 | | | | | | |--- lead_time <= 33.00 | | | | | | | |--- weights: [267.54, 0.00] class: 0 | | | | | | |--- lead_time > 33.00 | | | | | | | |--- no_of_previous_bookings_not_canceled <= 10.50 | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | |--- weights: [18.14, 0.00] class: 0 | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | |--- no_of_previous_bookings_not_canceled > 10.50 | | | | | | | | |--- arrival_date <= 13.00 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- arrival_date > 13.00 | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | |--- avg_price_per_room > 202.50 | | | | | |--- month_December <= 0.50 | | | | | | |--- weights: [0.00, 10.34] class: 1 | | | | | |--- month_December > 0.50 | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | |--- lead_time > 86.50 | | | | |--- avg_price_per_room <= 93.33 | | | | | |--- lead_time <= 139.50 | | | | | | |--- month_December <= 0.50 | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | |--- lead_time <= 124.50 | | | | | | | | | |--- month_March <= 0.50 | | | | | | | | | | |--- weights: [7.56, 0.00] class: 0 | | | | | | | | | |--- month_March > 0.50 | | | | | | | | | | |--- weights: [13.60, 0.00] class: 0 | | | | | | | | |--- lead_time > 124.50 | | | | | | | | | |--- lead_time <= 125.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- lead_time > 125.50 | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | |--- avg_price_per_room <= 61.12 | | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | | |--- weights: [0.00, 7.39] class: 1 | | | | | | | | |--- avg_price_per_room > 61.12 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 84.00 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 84.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- month_June <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- month_June > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- month_December > 0.50 | | | | | | | |--- weights: [20.41, 0.00] class: 0 | | | | | |--- lead_time > 139.50 | | | | | | |--- arrival_date <= 4.00 | | | | | | | |--- lead_time <= 144.50 | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | | |--- lead_time > 144.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | |--- arrival_date > 4.00 | | | | | | | |--- month_July <= 0.50 | | | | | | | | |--- weights: [38.54, 0.00] class: 0 | | | | | | | |--- month_July > 0.50 | | | | | | | | |--- arrival_date <= 12.00 | | | | | | | | | |--- arrival_date <= 9.00 | | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 9.00 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | |--- arrival_date > 12.00 | | | | | | | | | |--- weights: [4.53, 0.00] class: 0 | | | | |--- avg_price_per_room > 93.33 | | | | | |--- no_of_week_nights <= 1.50 | | | | | | |--- lead_time <= 114.00 | | | | | | | |--- month_June <= 0.50 | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | |--- avg_price_per_room <= 98.69 | | | | | | | | | | |--- avg_price_per_room <= 97.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 97.50 | | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 98.69 | | | | | | | | | | |--- month_September <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- month_September > 0.50 | | | | | | | | | | | |--- weights: [0.00, 7.39] class: 1 | | | | | | | |--- month_June > 0.50 | | | | | | | | |--- lead_time <= 98.50 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- lead_time > 98.50 | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | |--- lead_time > 114.00 | | | | | | | |--- avg_price_per_room <= 114.28 | | | | | | | | |--- month_August <= 0.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | |--- month_August > 0.50 | | | | | | | | | |--- avg_price_per_room <= 105.23 | | | | | | | | | | |--- lead_time <= 134.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 134.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 105.23 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | |--- avg_price_per_room > 114.28 | | | | | | | | |--- lead_time <= 139.50 | | | | | | | | | |--- weights: [7.56, 0.00] class: 0 | | | | | | | | |--- lead_time > 139.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | |--- no_of_week_nights > 1.50 | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | |--- avg_price_per_room <= 95.02 | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | |--- weights: [0.00, 7.39] class: 1 | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 95.02 | | | | | | | | |--- month_June <= 0.50 | | | | | | | | | |--- arrival_date <= 10.00 | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 10.00 | | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- month_June > 0.50 | | | | | | | | | |--- arrival_date <= 12.00 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_date > 12.00 | | | | | | | | | | |--- weights: [0.00, 5.91] class: 1 | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | |--- arrival_date <= 2.00 | | | | | | | | | |--- avg_price_per_room <= 98.55 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- avg_price_per_room > 98.55 | | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | | |--- arrival_date > 2.00 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [9.07, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [9.07, 0.00] class: 0 | | | | | | | |--- arrival_date > 11.50 | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- month_October <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- month_October > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | | |--- weights: [15.87, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 9.50 | | | | |--- avg_price_per_room <= 200.38 | | | | | |--- lead_time <= 2.50 | | | | | | |--- month_January <= 0.50 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- month_October <= 0.50 | | | | | | | | | |--- month_December <= 0.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | |--- month_December > 0.50 | | | | | | | | | | |--- weights: [27.21, 0.00] class: 0 | | | | | | | | |--- month_October > 0.50 | | | | | | | | | |--- weights: [38.54, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- arrival_date <= 10.00 | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | |--- arrival_date > 10.00 | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | |--- month_January > 0.50 | | | | | | | |--- weights: [77.09, 0.00] class: 0 | | | | | |--- lead_time > 2.50 | | | | | | |--- avg_price_per_room <= 99.42 | | | | | | | |--- month_January <= 0.50 | | | | | | | | |--- month_December <= 0.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 83.01 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 83.01 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 77.70 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 77.70 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- month_December > 0.50 | | | | | | | | | |--- weights: [35.52, 0.00] class: 0 | | | | | | | |--- month_January > 0.50 | | | | | | | | |--- weights: [56.68, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 99.42 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- avg_price_per_room <= 172.00 | | | | | | | | | |--- avg_price_per_room <= 161.50 | | | | | | | | | | |--- avg_price_per_room <= 102.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 102.00 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | |--- avg_price_per_room > 161.50 | | | | | | | | | | |--- arrival_date <= 24.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 24.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 172.00 | | | | | | | | | |--- weights: [15.12, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- month_December <= 0.50 | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | |--- month_January <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- month_January > 0.50 | | | | | | | | | | | |--- weights: [4.53, 0.00] class: 0 | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | |--- weights: [4.53, 0.00] class: 0 | | | | | | | | |--- month_December > 0.50 | | | | | | | | | |--- weights: [6.80, 0.00] class: 0 | | | | |--- avg_price_per_room > 200.38 | | | | | |--- month_January <= 0.50 | | | | | | |--- month_December <= 0.50 | | | | | | | |--- weights: [0.00, 87.17] class: 1 | | | | | | |--- month_December > 0.50 | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | |--- month_January > 0.50 | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | |--- lead_time > 9.50 | | | | |--- required_car_parking_space <= 0.50 | | | | | |--- avg_price_per_room <= 105.28 | | | | | | |--- lead_time <= 25.50 | | | | | | | |--- month_January <= 0.50 | | | | | | | | |--- month_December <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 63.18 | | | | | | | | | | |--- lead_time <= 24.00 | | | | | | | | | | | |--- weights: [14.36, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 24.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 63.18 | | | | | | | | | | |--- month_September <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | | | | | |--- month_September > 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- month_December > 0.50 | | | | | | | | | |--- weights: [50.64, 0.00] class: 0 | | | | | | | |--- month_January > 0.50 | | | | | | | | |--- weights: [59.70, 0.00] class: 0 | | | | | | |--- lead_time > 25.50 | | | | | | | |--- avg_price_per_room <= 59.44 | | | | | | | | |--- month_July <= 0.50 | | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | | |--- lead_time <= 34.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 34.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- month_July > 0.50 | | | | | | | | | |--- lead_time <= 68.50 | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | | |--- lead_time > 68.50 | | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 59.44 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 74.69 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- avg_price_per_room > 74.69 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 25 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 68.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- lead_time > 68.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 60.81 | | | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 60.81 | | | | | | | | | | | |--- truncated branch of depth 30 | | | | | |--- avg_price_per_room > 105.28 | | | | | | |--- avg_price_per_room <= 200.35 | | | | | | | |--- month_December <= 0.50 | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | |--- month_January <= 0.50 | | | | | | | | | | |--- no_of_previous_bookings_not_canceled <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 36 | | | | | | | | | | |--- no_of_previous_bookings_not_canceled > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- month_January > 0.50 | | | | | | | | | | |--- lead_time <= 34.50 | | | | | | | | | | | |--- weights: [10.58, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 34.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 127.44 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 127.44 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 12.00 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- no_of_week_nights > 12.00 | | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | |--- month_December > 0.50 | | | | | | | | |--- lead_time <= 23.50 | | | | | | | | | |--- weights: [20.41, 0.00] class: 0 | | | | | | | | |--- lead_time > 23.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- lead_time <= 47.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 47.50 | | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 200.35 | | | | | | | |--- month_December <= 0.50 | | | | | | | | |--- month_January <= 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [0.00, 14.77] class: 1 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [0.00, 372.33] class: 1 | | | | | | | | |--- month_January > 0.50 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | |--- month_December > 0.50 | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | |--- required_car_parking_space > 0.50 | | | | | |--- avg_price_per_room <= 212.00 | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | |--- weights: [70.29, 0.00] class: 0 | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | |--- avg_price_per_room > 212.00 | | | | | | |--- month_August <= 0.50 | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | |--- month_August > 0.50 | | | | | | | |--- weights: [0.00, 1.48] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- lead_time <= 7.50 | | | | |--- avg_price_per_room <= 140.62 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- avg_price_per_room <= 74.75 | | | | | | | |--- lead_time <= 6.50 | | | | | | | | |--- weights: [241.09, 0.00] class: 0 | | | | | | | |--- lead_time > 6.50 | | | | | | | | |--- month_February <= 0.50 | | | | | | | | | |--- weights: [12.09, 0.00] class: 0 | | | | | | | | |--- month_February > 0.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | |--- avg_price_per_room > 74.75 | | | | | | | |--- lead_time <= 1.50 | | | | | | | | |--- avg_price_per_room <= 74.88 | | | | | | | | | |--- month_March <= 0.50 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | |--- month_March > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | |--- avg_price_per_room > 74.88 | | | | | | | | | |--- avg_price_per_room <= 130.25 | | | | | | | | | | |--- month_October <= 0.50 | | | | | | | | | | | |--- weights: [209.34, 0.00] class: 0 | | | | | | | | | | |--- month_October > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 130.25 | | | | | | | | | | |--- avg_price_per_room <= 130.72 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | | |--- avg_price_per_room > 130.72 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | |--- lead_time > 1.50 | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | |--- month_February <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- month_February > 0.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- no_of_week_nights <= 13.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 13.50 | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | |--- avg_price_per_room > 140.62 | | | | | |--- lead_time <= 4.50 | | | | | | |--- avg_price_per_room <= 241.00 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 142.10 | | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | | |--- month_February <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- month_February > 0.50 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 141.25 | | | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 141.25 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 142.10 | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 16.50 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | |--- weights: [5.29, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- month_November <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- month_November > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | |--- avg_price_per_room > 241.00 | | | | | | | |--- arrival_date <= 17.00 | | | | | | | | |--- arrival_date <= 13.00 | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | |--- weights: [3.02, 0.00] class: 0 | | | | | | | | |--- arrival_date > 13.00 | | | | | | | | | |--- avg_price_per_room <= 249.00 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- avg_price_per_room > 249.00 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | |--- arrival_date > 17.00 | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | |--- lead_time > 4.50 | | | | | | |--- arrival_date <= 23.50 | | | | | | | |--- month_September <= 0.50 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- month_November <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- month_November > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [13.60, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- month_June <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- month_June > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 5.91] class: 1 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | |--- month_September > 0.50 | | | | | | | | |--- avg_price_per_room <= 168.21 | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | |--- arrival_date <= 16.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 16.00 | | | | | | | | | | | |--- weights: [0.00, 4.43] class: 1 | | | | | | | | |--- avg_price_per_room > 168.21 | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | |--- arrival_date > 23.50 | | | | | | | |--- weights: [19.65, 0.00] class: 0 | | | |--- lead_time > 7.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 94.50 | | | | | | |--- no_of_week_nights <= 7.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- avg_price_per_room <= 170.27 | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | |--- weights: [379.39, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | |--- avg_price_per_room <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 131.50 | | | | | | | | | | | |--- weights: [15.87, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 170.27 | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | |--- weights: [15.87, 0.00] class: 0 | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | |--- lead_time <= 12.50 | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | | |--- lead_time > 12.50 | | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | | |--- weights: [0.76, 1.48] class: 1 | | | | | | |--- no_of_week_nights > 7.50 | | | | | | | |--- avg_price_per_room <= 99.12 | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 99.12 | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | |--- lead_time > 94.50 | | | | | | |--- avg_price_per_room <= 86.60 | | | | | | | |--- no_of_weekend_nights <= 3.00 | | | | | | | | |--- month_September <= 0.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- lead_time <= 106.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 106.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- lead_time <= 140.00 | | | | | | | | | | | |--- weights: [48.37, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 140.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- month_September > 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | |--- no_of_weekend_nights > 3.00 | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | |--- avg_price_per_room > 86.60 | | | | | | | |--- avg_price_per_room <= 89.55 | | | | | | | | |--- lead_time <= 128.00 | | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | | |--- month_May <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- month_May > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 128.00 | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 89.55 | | | | | | | | |--- avg_price_per_room <= 120.50 | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | |--- weights: [15.87, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | |--- lead_time <= 106.50 | | | | | | | | | | | |--- weights: [6.05, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 106.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- avg_price_per_room > 120.50 | | | | | | | | | |--- arrival_date <= 18.00 | | | | | | | | | | |--- month_August <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- month_August > 0.50 | | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 18.00 | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 121.97 | | | | | | |--- month_January <= 0.50 | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | |--- avg_price_per_room <= 67.41 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 69.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 69.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- avg_price_per_room > 67.41 | | | | | | | | | |--- month_December <= 0.50 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 29 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- month_December > 0.50 | | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | |--- month_December <= 0.50 | | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | | |--- lead_time <= 121.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time > 121.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | | |--- weights: [0.00, 22.16] class: 1 | | | | | | | | |--- month_December > 0.50 | | | | | | | | | |--- avg_price_per_room <= 71.44 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 71.44 | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | |--- month_January > 0.50 | | | | | | | |--- lead_time <= 99.00 | | | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | | | |--- weights: [206.32, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | | | |--- lead_time <= 33.00 | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | | | | |--- lead_time > 33.00 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | |--- lead_time > 99.00 | | | | | | | | |--- avg_price_per_room <= 97.74 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- lead_time <= 131.00 | | | | | | | | | | | |--- weights: [5.29, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 131.00 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 5.91] class: 1 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 97.74 | | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | | |--- weights: [0.00, 10.34] class: 1 | | | | | |--- avg_price_per_room > 121.97 | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- avg_price_per_room <= 179.62 | | | | | | | | | |--- lead_time <= 31.50 | | | | | | | | | | |--- avg_price_per_room <= 149.36 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- avg_price_per_room > 149.36 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | | | | |--- lead_time > 31.50 | | | | | | | | | | |--- month_October <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 29 | | | | | | | | | | |--- month_October > 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | |--- avg_price_per_room > 179.62 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- lead_time <= 93.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- lead_time > 93.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- lead_time <= 14.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 14.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | |--- avg_price_per_room <= 148.00 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 148.00 | | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | | |--- month_September <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- month_September > 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | |--- month_August <= 0.50 | | | | | | | | |--- weights: [70.29, 0.00] class: 0 | | | | | | | |--- month_August > 0.50 | | | | | | | | |--- weights: [26.45, 0.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- avg_price_per_room <= 0.50 | | | | | | |--- weights: [69.53, 0.00] class: 0 | | | | | |--- avg_price_per_room > 0.50 | | | | | | |--- weights: [2177.34, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- lead_time <= 8.50 | | | | | | | | |--- month_October <= 0.50 | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | |--- weights: [46.10, 0.00] class: 0 | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | |--- month_August <= 0.50 | | | | | | | | | | | |--- weights: [6.05, 0.00] class: 0 | | | | | | | | | | |--- month_August > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- month_October > 0.50 | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | |--- lead_time > 8.50 | | | | | | | | |--- month_January <= 0.50 | | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | | |--- avg_price_per_room <= 91.10 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- avg_price_per_room > 91.10 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | | |--- month_January > 0.50 | | | | | | | | | |--- weights: [23.43, 0.00] class: 0 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [73.31, 0.00] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- no_of_previous_bookings_not_canceled <= 0.50 | | | | | | | | |--- weights: [0.00, 10.34] class: 1 | | | | | | | |--- no_of_previous_bookings_not_canceled > 0.50 | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.50 | | | | | |--- avg_price_per_room <= 200.70 | | | | | | |--- avg_price_per_room <= 93.97 | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- month_February <= 0.50 | | | | | | | | | | | |--- weights: [18.14, 0.00] class: 0 | | | | | | | | | | |--- month_February > 0.50 | | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 71.92 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 71.92 | | | | | | | | | | | |--- weights: [0.76, 1.48] class: 1 | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | |--- avg_price_per_room <= 73.78 | | | | | | | | | |--- lead_time <= 92.50 | | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | | | | |--- lead_time > 92.50 | | | | | | | | | | |--- lead_time <= 145.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 145.50 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | |--- avg_price_per_room > 73.78 | | | | | | | | | |--- month_January <= 0.50 | | | | | | | | | | |--- month_November <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- month_November > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- month_January > 0.50 | | | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | | | |--- weights: [0.00, 13.30] class: 1 | | | | | | |--- avg_price_per_room > 93.97 | | | | | | | |--- month_July <= 0.50 | | | | | | | | |--- month_June <= 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 102.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 102.50 | | | | | | | | | | | |--- weights: [10.58, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- weights: [8.31, 0.00] class: 0 | | | | | | | | |--- month_June > 0.50 | | | | | | | | | |--- avg_price_per_room <= 179.10 | | | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | | | |--- weights: [30.99, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 179.10 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | |--- month_July > 0.50 | | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | | |--- avg_price_per_room <= 149.35 | | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 149.35 | | | | | | | | | | |--- lead_time <= 112.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 112.50 | | | | | | | | | | | |--- weights: [5.29, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | | |--- arrival_date <= 18.00 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- arrival_date > 18.00 | | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | |--- avg_price_per_room > 200.70 | | | | | | |--- weights: [0.00, 33.98] class: 1 | | | | |--- no_of_special_requests > 2.50 | | | | | |--- weights: [129.23, 0.00] class: 0 |--- lead_time > 150.50 | |--- no_of_special_requests <= 2.50 | | |--- avg_price_per_room <= 100.04 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- lead_time <= 270.50 | | | | | |--- avg_price_per_room <= 89.76 | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | |--- lead_time <= 159.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 87.00 | | | | | | | | | | | |--- weights: [21.16, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 87.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | |--- lead_time > 159.50 | | | | | | | | | |--- month_February <= 0.50 | | | | | | | | | | |--- month_November <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- month_November > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | | | | |--- month_February > 0.50 | | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | | |--- arrival_date > 25.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- lead_time <= 177.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 177.00 | | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- weights: [30.99, 0.00] class: 0 | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | |--- lead_time <= 151.50 | | | | | | | | |--- month_July <= 0.50 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- month_July > 0.50 | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | | |--- lead_time > 151.50 | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- weights: [6.05, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | |--- weights: [66.51, 0.00] class: 0 | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | |--- lead_time <= 168.00 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- lead_time > 168.00 | | | | | | | | | | |--- weights: [3.02, 0.00] class: 0 | | | | | |--- avg_price_per_room > 89.76 | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | |--- month_May <= 0.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- weights: [6.80, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- month_September <= 0.50 | | | | | | | | | | |--- month_August <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- month_August > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- month_September > 0.50 | | | | | | | | | | |--- arrival_date <= 24.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 24.00 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | |--- month_May > 0.50 | | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- arrival_date > 3.00 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 21.00 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 21.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [0.00, 10.34] class: 1 | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | |--- arrival_date <= 30.00 | | | | | | | | |--- lead_time <= 193.00 | | | | | | | | | |--- weights: [15.87, 0.00] class: 0 | | | | | | | | |--- lead_time > 193.00 | | | | | | | | | |--- lead_time <= 201.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- lead_time > 201.50 | | | | | | | | | | |--- month_June <= 0.50 | | | | | | | | | | | |--- weights: [6.05, 0.00] class: 0 | | | | | | | | | | |--- month_June > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_date > 30.00 | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | |--- lead_time > 270.50 | | | | | |--- lead_time <= 361.50 | | | | | | |--- avg_price_per_room <= 84.38 | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- month_May <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- month_May > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 45.50 | | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 45.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | |--- arrival_date > 18.50 | | | | | | | | |--- market_segment_type_Complementary <= 0.50 | | | | | | | | | |--- weights: [7.56, 0.00] class: 0 | | | | | | | | |--- market_segment_type_Complementary > 0.50 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 84.38 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- lead_time <= 327.50 | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- weights: [0.00, 10.34] class: 1 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- lead_time > 327.50 | | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | | |--- month_June <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- month_June > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | |--- month_July <= 0.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- month_July > 0.50 | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | |--- weights: [3.02, 0.00] class: 0 | | | | | |--- lead_time > 361.50 | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | |--- weights: [0.00, 47.28] class: 1 | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | |--- month_August <= 0.50 | | | | | | | | |--- lead_time <= 411.50 | | | | | | | | | |--- avg_price_per_room <= 90.75 | | | | | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 90.75 | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | |--- lead_time > 411.50 | | | | | | | | | |--- weights: [0.00, 2.95] class: 1 | | | | | | | |--- month_August > 0.50 | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- avg_price_per_room <= 2.50 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- weights: [0.76, 1.48] class: 1 | | | | | | | |--- arrival_date > 8.50 | | | | | | | | |--- weights: [5.29, 0.00] class: 0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | |--- weights: [0.00, 8.86] class: 1 | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | |--- avg_price_per_room > 2.50 | | | | | | |--- no_of_adults <= 2.50 | | | | | | | |--- month_December <= 0.50 | | | | | | | | |--- weights: [0.00, 879.11] class: 1 | | | | | | | |--- month_December > 0.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- lead_time <= 275.50 | | | | | | | | | | |--- arrival_date <= 24.00 | | | | | | | | | | | |--- weights: [3.02, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 24.00 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- lead_time > 275.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [0.00, 4.43] class: 1 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- lead_time <= 223.50 | | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | |--- lead_time > 223.50 | | | | | | | | | | |--- avg_price_per_room <= 58.92 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | | |--- avg_price_per_room > 58.92 | | | | | | | | | | | |--- weights: [0.00, 54.67] class: 1 | | | | | | |--- no_of_adults > 2.50 | | | | | | | |--- avg_price_per_room <= 88.41 | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 88.41 | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- lead_time <= 180.50 | | | | | | | |--- month_February <= 0.50 | | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | | |--- month_November <= 0.50 | | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- month_November > 0.50 | | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | | |--- weights: [16.63, 0.00] class: 0 | | | | | | | |--- month_February > 0.50 | | | | | | | | |--- avg_price_per_room <= 80.10 | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 80.10 | | | | | | | | | |--- weights: [0.00, 4.43] class: 1 | | | | | | |--- lead_time > 180.50 | | | | | | | |--- month_December <= 0.50 | | | | | | | | |--- avg_price_per_room <= 36.16 | | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 36.16 | | | | | | | | | |--- month_January <= 0.50 | | | | | | | | | | |--- weights: [0.00, 268.90] class: 1 | | | | | | | | | |--- month_January > 0.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [0.00, 4.43] class: 1 | | | | | | | |--- month_December > 0.50 | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [0.00, 5.91] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- weights: [5.29, 0.00] class: 0 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 75.02 | | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 75.02 | | | | | | | | | | | |--- weights: [0.00, 4.43] class: 1 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- lead_time <= 198.50 | | | | | | | |--- month_March <= 0.50 | | | | | | | | |--- avg_price_per_room <= 97.73 | | | | | | | | | |--- lead_time <= 182.50 | | | | | | | | | | |--- month_February <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- month_February > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 182.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [22.67, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 97.73 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- month_March > 0.50 | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | |--- arrival_date <= 8.00 | | | | | | | | | | |--- weights: [0.00, 4.43] class: 1 | | | | | | | | | |--- arrival_date > 8.00 | | | | | | | | | | |--- arrival_date <= 24.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 24.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | |--- weights: [0.00, 5.91] class: 1 | | | | | | |--- lead_time > 198.50 | | | | | | | |--- lead_time <= 330.50 | | | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 77.29 | | | | | | | | | | |--- month_December <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- month_December > 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- avg_price_per_room > 77.29 | | | | | | | | | | |--- avg_price_per_room <= 80.72 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 80.72 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | | | |--- no_of_weekend_nights <= 3.00 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 3.00 | | | | | | | | | | |--- weights: [0.00, 7.39] class: 1 | | | | | | | |--- lead_time > 330.50 | | | | | | | | |--- month_February <= 0.50 | | | | | | | | | |--- month_September <= 0.50 | | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | | |--- weights: [0.00, 16.25] class: 1 | | | | | | | | | |--- month_September > 0.50 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- month_February > 0.50 | | | | | | | | | |--- avg_price_per_room <= 92.80 | | | | | | | | | | |--- lead_time <= 346.50 | | | | | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 346.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 92.80 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | |--- avg_price_per_room > 100.04 | | | |--- month_December <= 0.50 | | | | |--- month_January <= 0.50 | | | | | |--- weights: [0.00, 3885.80] class: 1 | | | | |--- month_January > 0.50 | | | | | |--- no_of_special_requests <= 0.50 | | | | | | |--- weights: [4.53, 0.00] class: 0 | | | | | |--- no_of_special_requests > 0.50 | | | | | | |--- lead_time <= 183.50 | | | | | | | |--- weights: [1.51, 0.00] class: 0 | | | | | | |--- lead_time > 183.50 | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | |--- avg_price_per_room <= 105.72 | | | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | | | | |--- avg_price_per_room > 105.72 | | | | | | | | | |--- weights: [0.00, 11.82] class: 1 | | | | | | | |--- arrival_date > 16.50 | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | |--- month_December > 0.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- arrival_date <= 6.50 | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | |--- arrival_date > 6.50 | | | | | | |--- weights: [27.96, 0.00] class: 0 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- arrival_date <= 24.00 | | | | | | |--- avg_price_per_room <= 156.82 | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 156.82 | | | | | | | |--- weights: [0.00, 1.48] class: 1 | | | | | |--- arrival_date > 24.00 | | | | | | |--- avg_price_per_room <= 106.04 | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 106.04 | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | | |--- weights: [0.00, 5.91] class: 1 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [0.76, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | |--- weights: [0.00, 13.30] class: 1 | |--- no_of_special_requests > 2.50 | | |--- month_February <= 0.50 | | | |--- weights: [159.46, 0.00] class: 0 | | |--- month_February > 0.50 | | | |--- weights: [2.27, 0.00] class: 0
# importance of features in the tree building
# (The importance of a feature is computed as the (normalized)
# total reduction of the criterion brought by that feature.
# It is also known as the Gini importance)
print(
pd.DataFrame(
model.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 3.417093e-01 avg_price_per_room 1.454052e-01 no_of_special_requests 1.080493e-01 arrival_date 9.539203e-02 market_segment_type_Online 9.517904e-02 no_of_week_nights 4.925279e-02 no_of_weekend_nights 3.023794e-02 no_of_adults 1.536033e-02 month_December 1.171987e-02 required_car_parking_space 1.009128e-02 type_of_meal_plan_Not Selected 8.510193e-03 month_July 8.018408e-03 month_January 7.583812e-03 month_August 7.560938e-03 month_June 7.081513e-03 month_February 6.859950e-03 month_October 6.330277e-03 month_September 6.300855e-03 month_March 6.255597e-03 month_May 6.211668e-03 room_type_reserved_Room_Type 4 5.848649e-03 month_November 5.569036e-03 no_of_children 4.015156e-03 type_of_meal_plan_Meal Plan 2 3.071321e-03 room_type_reserved_Room_Type 5 1.907544e-03 room_type_reserved_Room_Type 2 1.517031e-03 market_segment_type_Offline 1.461353e-03 room_type_reserved_Room_Type 6 9.983025e-04 room_type_reserved_Room_Type 7 8.849796e-04 repeated_guest 7.447629e-04 market_segment_type_Corporate 4.711456e-04 no_of_previous_bookings_not_canceled 4.004417e-04 no_of_previous_cancellations 5.402827e-19 market_segment_type_Complementary 4.379134e-19 room_type_reserved_Room_Type 3 0.000000e+00 type_of_meal_plan_Meal Plan 3 0.000000e+00
# plot importances
importances = model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
lead_time is the most important variable for predicting cancellation# choose the type of classifier
estimator = DecisionTreeClassifier(random_state=1, class_weight="balanced")
# grid of parameters to choose from
parameters = {
"max_depth": np.arange(2, 9, 2),
"criterion": ["entropy", "gini"],
"splitter": ["best", "random"],
# "min_samples_split": [50, 100],
"min_impurity_decrease": [0.0001, 0.001, 0.01],
}
# type of scoring used to compare parameter combinations
scorer = make_scorer(f1_score)
# run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# fit the best algorithm to the data
estimator.fit(X_train, y_train)
DecisionTreeClassifier(class_weight='balanced', max_depth=8,
min_impurity_decrease=0.0001, random_state=1)
# confusion matrix of training set using estimator
confusion_matrix_statsmodels(estimator, X_train, y_train)
# check performance on training set
decision_tree_tune_perf_train = model_performance_classification_statsmodels(
estimator, X_train, y_train
)
print("Training set performance:")
decision_tree_tune_perf_train
Training set performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.821343 | 0.774697 | 0.719096 | 0.745862 |
# confusion matrix of test set using estimator
confusion_matrix_statsmodels(estimator, X_test, y_test)
# check performance on the test set
decision_tree_tune_perf_test = model_performance_classification_statsmodels(
estimator, X_test, y_test
)
print("Training set performance:")
decision_tree_tune_perf_test
Training set performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.811441 | 0.76577 | 0.712453 | 0.73815 |
# visualizing the tree
plt.figure(figsize=(20, 10))
out = tree.plot_tree(
estimator,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# text report showing the rules of a decision tree -
print(tree.export_text(estimator, feature_names=feature_names, show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 86.50 | | | | |--- avg_price_per_room <= 202.50 | | | | | |--- repeated_guest <= 0.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- avg_price_per_room <= 59.50 | | | | | | | | |--- weights: [77.09, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 59.50 | | | | | | | | |--- weights: [421.71, 164.00] class: 0 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- weights: [557.75, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- weights: [791.28, 146.27] class: 0 | | | | | |--- repeated_guest > 0.50 | | | | | | |--- weights: [287.19, 1.48] class: 0 | | | | |--- avg_price_per_room > 202.50 | | | | | |--- weights: [0.76, 10.34] class: 1 | | | |--- lead_time > 86.50 | | | | |--- avg_price_per_room <= 93.33 | | | | | |--- lead_time <= 139.50 | | | | | | |--- month_December <= 0.50 | | | | | | | |--- weights: [230.51, 97.51] class: 0 | | | | | | |--- month_December > 0.50 | | | | | | | |--- weights: [20.41, 0.00] class: 0 | | | | | |--- lead_time > 139.50 | | | | | | |--- weights: [47.61, 4.43] class: 0 | | | | |--- avg_price_per_room > 93.33 | | | | | |--- no_of_week_nights <= 1.50 | | | | | | |--- weights: [27.21, 42.85] class: 1 | | | | | |--- no_of_week_nights > 1.50 | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | |--- weights: [24.94, 32.50] class: 1 | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | |--- weights: [63.48, 19.21] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 9.50 | | | | |--- avg_price_per_room <= 200.38 | | | | | |--- lead_time <= 2.50 | | | | | | |--- weights: [514.67, 75.35] class: 0 | | | | | |--- lead_time > 2.50 | | | | | | |--- avg_price_per_room <= 99.42 | | | | | | | |--- month_January <= 0.50 | | | | | | | | |--- weights: [167.78, 70.92] class: 0 | | | | | | | |--- month_January > 0.50 | | | | | | | | |--- weights: [56.68, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 99.42 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- weights: [97.49, 54.67] class: 0 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- weights: [79.35, 137.41] class: 1 | | | | |--- avg_price_per_room > 200.38 | | | | | |--- weights: [1.51, 87.17] class: 1 | | | |--- lead_time > 9.50 | | | | |--- required_car_parking_space <= 0.50 | | | | | |--- avg_price_per_room <= 105.28 | | | | | | |--- lead_time <= 25.50 | | | | | | | |--- month_January <= 0.50 | | | | | | | | |--- weights: [216.90, 220.15] class: 1 | | | | | | | |--- month_January > 0.50 | | | | | | | | |--- weights: [59.70, 0.00] class: 0 | | | | | | |--- lead_time > 25.50 | | | | | | | |--- avg_price_per_room <= 59.44 | | | | | | | | |--- weights: [46.86, 29.55] class: 0 | | | | | | | |--- avg_price_per_room > 59.44 | | | | | | | | |--- weights: [560.77, 1456.80] class: 1 | | | | | |--- avg_price_per_room > 105.28 | | | | | | |--- avg_price_per_room <= 200.35 | | | | | | | |--- month_December <= 0.50 | | | | | | | | |--- weights: [833.60, 3061.36] class: 1 | | | | | | | |--- month_December > 0.50 | | | | | | | | |--- weights: [33.25, 44.32] class: 1 | | | | | | |--- avg_price_per_room > 200.35 | | | | | | | |--- month_December <= 0.50 | | | | | | | | |--- weights: [0.76, 387.10] class: 1 | | | | | | | |--- month_December > 0.50 | | | | | | | | |--- weights: [3.78, 0.00] class: 0 | | | | |--- required_car_parking_space > 0.50 | | | | | |--- avg_price_per_room <= 212.00 | | | | | | |--- weights: [70.29, 1.48] class: 0 | | | | | |--- avg_price_per_room > 212.00 | | | | | | |--- weights: [0.00, 2.95] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- lead_time <= 7.50 | | | | |--- avg_price_per_room <= 140.62 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- weights: [942.43, 56.14] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- weights: [1.51, 4.43] class: 1 | | | | |--- avg_price_per_room > 140.62 | | | | | |--- lead_time <= 4.50 | | | | | | |--- weights: [185.92, 23.64] class: 0 | | | | | |--- lead_time > 4.50 | | | | | | |--- arrival_date <= 23.50 | | | | | | | |--- weights: [53.66, 28.07] class: 0 | | | | | | |--- arrival_date > 23.50 | | | | | | | |--- weights: [19.65, 0.00] class: 0 | | | |--- lead_time > 7.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 94.50 | | | | | | |--- weights: [418.69, 8.86] class: 0 | | | | | |--- lead_time > 94.50 | | | | | | |--- avg_price_per_room <= 86.60 | | | | | | | |--- weights: [64.24, 8.86] class: 0 | | | | | | |--- avg_price_per_room > 86.60 | | | | | | | |--- avg_price_per_room <= 89.55 | | | | | | | | |--- weights: [6.05, 10.34] class: 1 | | | | | | | |--- avg_price_per_room > 89.55 | | | | | | | | |--- weights: [35.52, 11.82] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 121.97 | | | | | | |--- month_January <= 0.50 | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | |--- weights: [2150.89, 1121.41] class: 0 | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | |--- weights: [9.07, 41.37] class: 1 | | | | | | |--- month_January > 0.50 | | | | | | | |--- lead_time <= 99.00 | | | | | | | | |--- weights: [210.10, 1.48] class: 0 | | | | | | | |--- lead_time > 99.00 | | | | | | | | |--- weights: [8.31, 20.68] class: 1 | | | | | |--- avg_price_per_room > 121.97 | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- weights: [1159.33, 995.83] class: 0 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- weights: [196.50, 304.36] class: 1 | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | |--- weights: [96.74, 0.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [2246.87, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- lead_time <= 8.50 | | | | | | | | |--- weights: [56.68, 2.95] class: 0 | | | | | | | |--- lead_time > 8.50 | | | | | | | | |--- weights: [198.76, 73.87] class: 0 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [73.31, 0.00] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- weights: [0.76, 10.34] class: 1 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [2.27, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.50 | | | | | |--- avg_price_per_room <= 200.70 | | | | | | |--- avg_price_per_room <= 93.97 | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | |--- weights: [21.16, 2.95] class: 0 | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | |--- weights: [69.53, 90.13] class: 1 | | | | | | |--- avg_price_per_room > 93.97 | | | | | | | |--- month_July <= 0.50 | | | | | | | | |--- weights: [248.64, 152.18] class: 0 | | | | | | | |--- month_July > 0.50 | | | | | | | | |--- weights: [56.68, 13.30] class: 0 | | | | | |--- avg_price_per_room > 200.70 | | | | | | |--- weights: [0.00, 33.98] class: 1 | | | | |--- no_of_special_requests > 2.50 | | | | | |--- weights: [129.23, 0.00] class: 0 |--- lead_time > 150.50 | |--- no_of_special_requests <= 2.50 | | |--- avg_price_per_room <= 100.04 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- lead_time <= 270.50 | | | | | |--- avg_price_per_room <= 89.76 | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | |--- weights: [121.68, 53.19] class: 0 | | | | | | | |--- arrival_date > 25.50 | | | | | | | | |--- weights: [38.54, 2.95] class: 0 | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | |--- lead_time <= 151.50 | | | | | | | | |--- weights: [0.76, 2.95] class: 1 | | | | | | | |--- lead_time > 151.50 | | | | | | | | |--- weights: [75.58, 2.95] class: 0 | | | | | |--- avg_price_per_room > 89.76 | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | |--- month_May <= 0.50 | | | | | | | | |--- weights: [21.16, 22.16] class: 1 | | | | | | | |--- month_May > 0.50 | | | | | | | | |--- weights: [1.51, 14.77] class: 1 | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | |--- weights: [22.67, 4.43] class: 0 | | | | |--- lead_time > 270.50 | | | | | |--- lead_time <= 361.50 | | | | | | |--- avg_price_per_room <= 84.38 | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | |--- weights: [15.12, 14.77] class: 0 | | | | | | | |--- arrival_date > 18.50 | | | | | | | | |--- weights: [8.31, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 84.38 | | | | | | | |--- weights: [18.14, 39.89] class: 1 | | | | | |--- lead_time > 361.50 | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | |--- weights: [0.00, 47.28] class: 1 | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | |--- weights: [5.29, 7.39] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- avg_price_per_room <= 2.50 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- weights: [6.80, 1.48] class: 0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- weights: [0.76, 8.86] class: 1 | | | | | |--- avg_price_per_room > 2.50 | | | | | | |--- weights: [6.05, 954.46] class: 1 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- lead_time <= 180.50 | | | | | | | |--- month_February <= 0.50 | | | | | | | | |--- weights: [60.46, 23.64] class: 0 | | | | | | | |--- month_February > 0.50 | | | | | | | | |--- weights: [0.76, 4.43] class: 1 | | | | | | |--- lead_time > 180.50 | | | | | | | |--- month_December <= 0.50 | | | | | | | | |--- weights: [3.02, 273.34] class: 1 | | | | | | | |--- month_December > 0.50 | | | | | | | | |--- weights: [6.80, 11.82] class: 1 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- lead_time <= 198.50 | | | | | | | |--- month_March <= 0.50 | | | | | | | | |--- weights: [131.50, 78.31] class: 0 | | | | | | | |--- month_March > 0.50 | | | | | | | | |--- weights: [8.31, 16.25] class: 1 | | | | | | |--- lead_time > 198.50 | | | | | | | |--- lead_time <= 330.50 | | | | | | | | |--- weights: [146.62, 168.43] class: 1 | | | | | | | |--- lead_time > 330.50 | | | | | | | | |--- weights: [4.53, 22.16] class: 1 | | |--- avg_price_per_room > 100.04 | | | |--- month_December <= 0.50 | | | | |--- month_January <= 0.50 | | | | | |--- weights: [0.00, 3885.80] class: 1 | | | | |--- month_January > 0.50 | | | | | |--- no_of_special_requests <= 0.50 | | | | | | |--- weights: [4.53, 0.00] class: 0 | | | | | |--- no_of_special_requests > 0.50 | | | | | | |--- weights: [3.02, 13.30] class: 1 | | | |--- month_December > 0.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- weights: [28.72, 0.00] class: 0 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- arrival_date <= 24.00 | | | | | | |--- weights: [4.53, 1.48] class: 0 | | | | | |--- arrival_date > 24.00 | | | | | | |--- weights: [3.78, 25.12] class: 1 | |--- no_of_special_requests > 2.50 | | |--- weights: [161.73, 0.00] class: 0
Using the above extracted decision rules we can make interpretations from the decision tree model like:
* For a booking with lead time no more than 86.5 days with 0 special requests, by a non-repeat guest that is neither from Online nor Offline market segments, with average price per room no more than 59.50 euro, the booking will not be canceled. But if the average price per room is greater than 59.50 euro, the booking will be canceled.
Interpretations from other decision rules can be made in a similar manner.
# importance of features in the tree building
print(
pd.DataFrame(
estimator.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.443857 no_of_special_requests 0.215326 market_segment_type_Online 0.190662 avg_price_per_room 0.083628 required_car_parking_space 0.015646 no_of_week_nights 0.011815 no_of_weekend_nights 0.009792 month_January 0.009545 month_December 0.008776 market_segment_type_Offline 0.002803 type_of_meal_plan_Not Selected 0.002349 arrival_date 0.001967 repeated_guest 0.001335 month_July 0.000584 month_May 0.000505 month_March 0.000499 type_of_meal_plan_Meal Plan 2 0.000475 month_February 0.000438 month_June 0.000000 month_November 0.000000 month_October 0.000000 month_August 0.000000 no_of_adults 0.000000 room_type_reserved_Room_Type 5 0.000000 market_segment_type_Corporate 0.000000 market_segment_type_Complementary 0.000000 room_type_reserved_Room_Type 7 0.000000 room_type_reserved_Room_Type 6 0.000000 no_of_children 0.000000 room_type_reserved_Room_Type 4 0.000000 room_type_reserved_Room_Type 3 0.000000 room_type_reserved_Room_Type 2 0.000000 type_of_meal_plan_Meal Plan 3 0.000000 no_of_previous_bookings_not_canceled 0.000000 no_of_previous_cancellations 0.000000 month_September 0.000000
# plot importances
importances = estimator.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
lead_time is still the most important feature, followed by no_of_special_requestsclf = DecisionTreeClassifier(random_state=1, class_weight={0: 0.35, 1: 0.65})
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = abs(path.ccp_alphas), path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000e+00 | 0.003830 |
| 1 | -1.355253e-20 | 0.003830 |
| 2 | -1.355253e-20 | 0.003830 |
| 3 | -1.355253e-20 | 0.003830 |
| 4 | -1.355253e-20 | 0.003830 |
| ... | ... | ... |
| 2799 | 8.072411e-03 | 0.333808 |
| 2800 | 8.391700e-03 | 0.342199 |
| 2801 | 1.374313e-02 | 0.355942 |
| 2802 | 3.387110e-02 | 0.423685 |
| 2803 | 7.598606e-02 | 0.499671 |
2804 rows × 2 columns
# plot total impurity vs effective alpha for training set
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs Effective alpha for Training Set")
plt.show()
Next, train a decision tree using the effective alphas. Last value in ccp_alphas is the alpha value that prunes the whole tree, clfs[-1], with one node
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(
random_state=1, ccp_alpha=ccp_alpha, class_weight={0: 0.35, 1: 0.65}
)
clf.fit(X_train, y_train)
clfs.append(clf)
print(
"Number of nodes in the last tree is {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is 1 with ccp_alpha: 0.07598605813687509
For the remainder, we remove the last element in clfs and ccp_alphas, because it is the trivial tree with only one node. Here we show that the number of nodes and tree depth decreases as alpha increases.
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of Nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
recall_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = recall_score(y_train, pred_train)
recall_train.append(values_train)
recall_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = recall_score(y_test, pred_test)
recall_test.append(values_test)
train_scores = [clf.score(X_train, y_train) for clf in clfs]
test_scores = [clf.score(X_test, y_test) for clf in clfs]
# visualize recall vs alpha for training/test sets
fig, ax = plt.subplots(figsize=(15, 8))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs Alpha for Training and Test Sets")
ax.plot(
ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post",
)
ax.plot(
ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post",
)
ax.legend()
plt.show()
# create model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=0.00020705683409440352,
class_weight={0: 0.35, 1: 0.65}, random_state=1)
best_model.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=0.00020705683409440352,
class_weight={0: 0.35, 1: 0.65}, random_state=1)
Checking performance on training set
# confusion matrix on training set using best model
confusion_matrix_statsmodels(best_model, X_train, y_train)
# training performance
print("Training performance:")
model_performance_classification_statsmodels(best_model, X_train, y_train)
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.819224 | 0.827569 | 0.69583 | 0.756003 |
Checking performance on test set
# confusion matrix on test set using best model
confusion_matrix_statsmodels(best_model, X_test, y_test)
# test performance
print("Test performance:")
model_performance_classification_statsmodels(best_model, X_test, y_test)
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.805242 | 0.811892 | 0.685175 | 0.743171 |
# visualizing the tree
plt.figure(figsize=(20, 10))
out = tree.plot_tree(
best_model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# text report showing the rules of a decision tree -
print(tree.export_text(best_model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 86.50 | | | | |--- avg_price_per_room <= 202.50 | | | | | |--- repeated_guest <= 0.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- avg_price_per_room <= 59.50 | | | | | | | | |--- weights: [35.70, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 59.50 | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | |--- weights: [195.30, 68.90] class: 0 | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | |--- weights: [0.00, 3.25] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- weights: [258.30, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- weights: [366.45, 64.35] class: 0 | | | | | |--- repeated_guest > 0.50 | | | | | | |--- weights: [133.00, 0.65] class: 0 | | | | |--- avg_price_per_room > 202.50 | | | | | |--- weights: [0.35, 4.55] class: 1 | | | |--- lead_time > 86.50 | | | | |--- avg_price_per_room <= 93.33 | | | | | |--- weights: [138.25, 44.85] class: 0 | | | | |--- avg_price_per_room > 93.33 | | | | | |--- no_of_week_nights <= 1.50 | | | | | | |--- weights: [12.60, 18.85] class: 1 | | | | | |--- no_of_week_nights > 1.50 | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | |--- weights: [11.55, 14.30] class: 1 | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | |--- weights: [29.40, 8.45] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 9.50 | | | | |--- avg_price_per_room <= 200.38 | | | | | |--- lead_time <= 2.50 | | | | | | |--- weights: [238.35, 33.15] class: 0 | | | | | |--- lead_time > 2.50 | | | | | | |--- avg_price_per_room <= 99.42 | | | | | | | |--- month_January <= 0.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- weights: [63.70, 17.55] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [14.00, 13.65] class: 0 | | | | | | | |--- month_January > 0.50 | | | | | | | | |--- weights: [26.25, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 99.42 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- weights: [45.15, 24.05] class: 0 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- weights: [36.75, 60.45] class: 1 | | | | |--- avg_price_per_room > 200.38 | | | | | |--- weights: [0.70, 38.35] class: 1 | | | |--- lead_time > 9.50 | | | | |--- required_car_parking_space <= 0.50 | | | | | |--- avg_price_per_room <= 105.28 | | | | | | |--- lead_time <= 25.50 | | | | | | | |--- month_January <= 0.50 | | | | | | | | |--- month_December <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 63.18 | | | | | | | | | | |--- weights: [7.70, 0.65] class: 0 | | | | | | | | | |--- avg_price_per_room > 63.18 | | | | | | | | | | |--- weights: [69.30, 96.20] class: 1 | | | | | | | | |--- month_December > 0.50 | | | | | | | | | |--- weights: [23.45, 0.00] class: 0 | | | | | | | |--- month_January > 0.50 | | | | | | | | |--- weights: [27.65, 0.00] class: 0 | | | | | | |--- lead_time > 25.50 | | | | | | | |--- avg_price_per_room <= 59.44 | | | | | | | | |--- month_July <= 0.50 | | | | | | | | | |--- weights: [20.30, 7.80] class: 0 | | | | | | | | |--- month_July > 0.50 | | | | | | | | | |--- weights: [1.40, 5.20] class: 1 | | | | | | | |--- avg_price_per_room > 59.44 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 74.69 | | | | | | | | | | |--- weights: [32.20, 39.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 74.69 | | | | | | | | | | |--- weights: [141.05, 328.90] class: 1 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- weights: [86.45, 273.00] class: 1 | | | | | |--- avg_price_per_room > 105.28 | | | | | | |--- avg_price_per_room <= 200.35 | | | | | | | |--- month_December <= 0.50 | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | |--- month_January <= 0.50 | | | | | | | | | | |--- weights: [345.10, 1153.10] class: 1 | | | | | | | | | |--- month_January > 0.50 | | | | | | | | | | |--- lead_time <= 34.50 | | | | | | | | | | | |--- weights: [4.90, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 34.50 | | | | | | | | | | | |--- weights: [2.80, 5.85] class: 1 | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | |--- weights: [33.25, 187.85] class: 1 | | | | | | | |--- month_December > 0.50 | | | | | | | | |--- lead_time <= 23.50 | | | | | | | | | |--- weights: [9.45, 0.00] class: 0 | | | | | | | | |--- lead_time > 23.50 | | | | | | | | | |--- weights: [5.95, 19.50] class: 1 | | | | | | |--- avg_price_per_room > 200.35 | | | | | | | |--- month_December <= 0.50 | | | | | | | | |--- weights: [0.35, 170.30] class: 1 | | | | | | | |--- month_December > 0.50 | | | | | | | | |--- weights: [1.75, 0.00] class: 0 | | | | |--- required_car_parking_space > 0.50 | | | | | |--- weights: [32.55, 1.95] class: 0 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- lead_time <= 7.50 | | | | |--- weights: [557.20, 49.40] class: 0 | | | |--- lead_time > 7.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 94.50 | | | | | | |--- weights: [193.90, 3.90] class: 0 | | | | | |--- lead_time > 94.50 | | | | | | |--- weights: [49.00, 13.65] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 121.97 | | | | | | |--- month_January <= 0.50 | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | |--- avg_price_per_room <= 67.41 | | | | | | | | | |--- weights: [78.40, 8.45] class: 0 | | | | | | | | |--- avg_price_per_room > 67.41 | | | | | | | | | |--- month_December <= 0.50 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- weights: [26.60, 0.00] class: 0 | | | | | | | | | |--- month_December > 0.50 | | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | | |--- weights: [77.35, 1.30] class: 0 | | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | | |--- weights: [5.60, 13.65] class: 1 | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | |--- weights: [4.20, 18.20] class: 1 | | | | | | |--- month_January > 0.50 | | | | | | | |--- lead_time <= 99.00 | | | | | | | | |--- weights: [97.30, 0.65] class: 0 | | | | | | | |--- lead_time > 99.00 | | | | | | | | |--- weights: [3.85, 9.10] class: 1 | | | | | |--- avg_price_per_room > 121.97 | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- avg_price_per_room <= 179.62 | | | | | | | | | |--- lead_time <= 31.50 | | | | | | | | | | |--- weights: [161.70, 83.85] class: 0 | | | | | | | | | |--- lead_time > 31.50 | | | | | | | | | | |--- month_October <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- month_October > 0.50 | | | | | | | | | | | |--- weights: [14.00, 29.90] class: 1 | | | | | | | | |--- avg_price_per_room > 179.62 | | | | | | | | | |--- weights: [95.55, 118.30] class: 1 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- weights: [91.00, 133.90] class: 1 | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | |--- weights: [44.80, 0.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [1040.55, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- weights: [152.25, 33.80] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- weights: [1.40, 4.55] class: 1 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.50 | | | | | |--- avg_price_per_room <= 200.70 | | | | | | |--- avg_price_per_room <= 93.97 | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | |--- weights: [9.80, 1.30] class: 0 | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | |--- weights: [32.20, 39.65] class: 1 | | | | | | |--- avg_price_per_room > 93.97 | | | | | | | |--- weights: [141.40, 72.80] class: 0 | | | | | |--- avg_price_per_room > 200.70 | | | | | | |--- weights: [0.00, 14.95] class: 1 | | | | |--- no_of_special_requests > 2.50 | | | | | |--- weights: [59.85, 0.00] class: 0 |--- lead_time > 150.50 | |--- no_of_special_requests <= 2.50 | | |--- avg_price_per_room <= 100.04 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- lead_time <= 270.50 | | | | | |--- avg_price_per_room <= 89.76 | | | | | | |--- weights: [109.55, 27.30] class: 0 | | | | | |--- avg_price_per_room > 89.76 | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | |--- weights: [10.50, 16.25] class: 1 | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | |--- weights: [10.50, 1.95] class: 0 | | | | |--- lead_time > 270.50 | | | | | |--- lead_time <= 361.50 | | | | | | |--- weights: [19.25, 24.05] class: 1 | | | | | |--- lead_time > 361.50 | | | | | | |--- weights: [2.45, 24.05] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- avg_price_per_room <= 2.50 | | | | | | |--- weights: [3.50, 4.55] class: 1 | | | | | |--- avg_price_per_room > 2.50 | | | | | | |--- weights: [2.80, 419.90] class: 1 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- lead_time <= 180.50 | | | | | | | |--- weights: [28.35, 12.35] class: 0 | | | | | | |--- lead_time > 180.50 | | | | | | | |--- weights: [4.55, 125.45] class: 1 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- lead_time <= 198.50 | | | | | | | |--- weights: [64.75, 41.60] class: 0 | | | | | | |--- lead_time > 198.50 | | | | | | | |--- weights: [70.00, 83.85] class: 1 | | |--- avg_price_per_room > 100.04 | | | |--- month_December <= 0.50 | | | | |--- weights: [3.50, 1715.35] class: 1 | | | |--- month_December > 0.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- weights: [13.30, 0.00] class: 0 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- weights: [3.85, 11.70] class: 1 | |--- no_of_special_requests > 2.50 | | |--- weights: [74.90, 0.00] class: 0
# importance of features in the tree building
print(
pd.DataFrame(
best_model.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.439063 no_of_special_requests 0.202773 market_segment_type_Online 0.182503 avg_price_per_room 0.092952 required_car_parking_space 0.016974 month_December 0.013761 no_of_weekend_nights 0.011356 no_of_week_nights 0.010367 month_January 0.009073 type_of_meal_plan_Not Selected 0.004676 market_segment_type_Offline 0.002562 month_July 0.002264 month_February 0.002154 month_October 0.001394 repeated_guest 0.001207 month_November 0.001080 no_of_adults 0.000988 month_May 0.000918 arrival_date 0.000916 month_August 0.000825 month_March 0.000822 no_of_children 0.000773 room_type_reserved_Room_Type 4 0.000599 month_June 0.000000 room_type_reserved_Room_Type 5 0.000000 market_segment_type_Corporate 0.000000 market_segment_type_Complementary 0.000000 room_type_reserved_Room_Type 7 0.000000 room_type_reserved_Room_Type 6 0.000000 room_type_reserved_Room_Type 3 0.000000 room_type_reserved_Room_Type 2 0.000000 type_of_meal_plan_Meal Plan 3 0.000000 type_of_meal_plan_Meal Plan 2 0.000000 no_of_previous_bookings_not_canceled 0.000000 no_of_previous_cancellations 0.000000 month_September 0.000000
# plot feature importances
importances = best_model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
This model is still quite complicated and not very easy to interpret
Creating model with 0.005 ccp_alpha
best_model2 = DecisionTreeClassifier(
ccp_alpha=0.005, class_weight={0: 0.35, 1: 0.65}, random_state=1
)
best_model2.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=0.005, class_weight={0: 0.35, 1: 0.65},
random_state=1)
# confusion matrix for training set using best model with chosen alpha
confusion_matrix_statsmodels(best_model2, X_train, y_train)
# training performance
decision_tree_postpruned_perf_train = model_performance_classification_statsmodels(
best_model2, X_train, y_train
)
print("Training performance:")
decision_tree_postpruned_perf_train
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.795244 | 0.733552 | 0.684186 | 0.70801 |
# confusion matrix for test set using best model with chosen alpha
confusion_matrix_statsmodels(best_model2, X_test, y_test)
# test performance
decision_tree_postpruned_perf_test = model_performance_classification_statsmodels(
best_model2, X_test, y_test
)
print("Test performance:")
decision_tree_postpruned_perf_test
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.793158 | 0.738639 | 0.688224 | 0.712541 |
# visualize the tree
plt.figure(figsize=(15, 10))
out = tree.plot_tree(
best_model2,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# text report showing the rules of a decision tree -
print(tree.export_text(best_model2, feature_names=feature_names, show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- weights: [1180.90, 228.15] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 9.50 | | | | |--- weights: [424.90, 187.20] class: 0 | | | |--- lead_time > 9.50 | | | | |--- weights: [845.60, 2289.30] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- lead_time <= 7.50 | | | | |--- weights: [557.20, 49.40] class: 0 | | | |--- lead_time > 7.50 | | | | |--- weights: [2017.05, 1110.85] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- weights: [1437.45, 167.05] class: 0 |--- lead_time > 150.50 | |--- no_of_special_requests <= 2.50 | | |--- avg_price_per_room <= 100.04 | | | |--- weights: [326.20, 781.30] class: 1 | | |--- avg_price_per_room > 100.04 | | | |--- weights: [20.65, 1727.05] class: 1 | |--- no_of_special_requests > 2.50 | | |--- weights: [74.90, 0.00] class: 0
# importance of features in the tree building
print(
pd.DataFrame(
best_model2.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.510867 no_of_special_requests 0.242488 market_segment_type_Online 0.203417 avg_price_per_room 0.043228 no_of_adults 0.000000 market_segment_type_Corporate 0.000000 market_segment_type_Offline 0.000000 month_August 0.000000 month_December 0.000000 month_February 0.000000 month_January 0.000000 room_type_reserved_Room_Type 7 0.000000 month_July 0.000000 month_June 0.000000 month_March 0.000000 month_May 0.000000 month_November 0.000000 month_October 0.000000 market_segment_type_Complementary 0.000000 room_type_reserved_Room_Type 5 0.000000 room_type_reserved_Room_Type 6 0.000000 no_of_children 0.000000 room_type_reserved_Room_Type 4 0.000000 room_type_reserved_Room_Type 3 0.000000 room_type_reserved_Room_Type 2 0.000000 type_of_meal_plan_Not Selected 0.000000 type_of_meal_plan_Meal Plan 3 0.000000 type_of_meal_plan_Meal Plan 2 0.000000 no_of_previous_bookings_not_canceled 0.000000 no_of_previous_cancellations 0.000000 repeated_guest 0.000000 arrival_date 0.000000 required_car_parking_space 0.000000 no_of_week_nights 0.000000 no_of_weekend_nights 0.000000 month_September 0.000000
importances = best_model2.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# training performance comparison
models_tree_train_comp_df = pd.concat(
[
decision_tree_perf_train.T,
decision_tree_tune_perf_train.T,
decision_tree_postpruned_perf_train.T,
],
axis=1,
)
models_tree_train_comp_df.columns = [
"Original Decision Tree",
"Pre-Pruned Decision Tree",
"Post-Pruned Decision Tree",
]
print("Training performance comparison:")
models_tree_train_comp_df
Training performance comparison:
| Original Decision Tree | Pre-Pruned Decision Tree | Post-Pruned Decision Tree | |
|---|---|---|---|
| Accuracy | 0.996200 | 0.821343 | 0.795244 |
| Recall | 1.000000 | 0.774697 | 0.733552 |
| Precision | 0.988894 | 0.719096 | 0.684186 |
| F1 | 0.994416 | 0.745862 | 0.708010 |
# training performance comparison
models_tree_test_comp_df = pd.concat(
[
decision_tree_perf_test.T,
decision_tree_tune_perf_test.T,
decision_tree_postpruned_perf_test.T,
],
axis=1,
)
models_tree_test_comp_df.columns = [
"Original Decision Tree",
"Pre-Pruned Decision Tree",
"Post-Pruned Decision Tree",
]
print("Test performance comparison:")
models_tree_test_comp_df
Test performance comparison:
| Original Decision Tree | Pre-Pruned Decision Tree | Post-Pruned Decision Tree | |
|---|---|---|---|
| Accuracy | 0.787979 | 0.811441 | 0.793158 |
| Recall | 0.694551 | 0.765770 | 0.738639 |
| Precision | 0.694551 | 0.712453 | 0.688224 |
| F1 | 0.694551 | 0.738150 | 0.712541 |
lead_time, no_of_special_requests, Online market_segment_type, and avg_price_per_room are the most important variables in predicting the customers that will contribute to cancellations